﻿(H Shacham, ACM CCS 2007) Generalizes return-to-libc by chaining returns Stack overwritten so returns go from a piece libc code to another control flow given by stack contents Pieces ("gadgets") chosen with useful instructions libc contains enough gadgets for arbitrary programs (Turing-complete) ROP "compilers" can produce arbitrary code Exploit 1 |eax iebx nextaddr addr3 | address | addr2 | value  || stack i Gadeel ts addrl addr2 addr3 pop %eax pop %ebx movl %eax, (%ebx) ret ret ret figure: Schwartz, Avgerinos, Brumley, USENiX 2011 Szekeres et al , SoK: Eternal War in Memory, S VAf e LTL S  = Af M  = Af Formal verification Lectore 7 Marius Minea Formal verification Lectore 7 Marios Minea Formal verification Lectore 7 Marios Minea Comparing models Abstraction Compositional reasoning 4 Comparing models Abstraction Compositional reasoning Comparing models Abstraction Compositional reasoning Consider two structures M and M', with AP D AP' A relation t c S x S' is a relation between M and M' iff Vs t s': - L(s)nAP' = Л'(з') (s and s' labeled identically with respect to AP') - Vsi with s si there exists s  with s' s  and si t (any successor of s is simulated by a successor of s') The structure M' simulates M (M p M') of there exists a simulation relation t such that for the initial States: Vsq e Sq 3sq e S'o sq p s'o Prop : The simulation relation is a over the set of structures (reflexive and transitive) We choose: s t s" aA ' s ti s' As' t2 s" Theorem: if M t M', then M'  = f => M |= f, for any ACTL* formula f over AP' Let M and M' be two structures with AP' = AP A relation   C S x S' is a relation between M and M' iff Vs,sz with s   - L(s) = L(s') - Vsi with s —> si there exists s  with sz s  and si   s  - Vsi with s' si there exists si with s s  and si   s  (or:   a simulation relation between M and M' be- tween M' and M) Structures M and M' are bisimilar if there exists a bisimulation relation   such that for initial States: Vsq e Sq 3sq e Sq sq   sq, and Vsg e Sq 3sq F Sq Sq   Sq Prop : The bisimulation relation is an equivalence relation among structures Theorem: if M   M' then V  e CTL*, M  = f M'  = f Conversely: Two structures that satisfy the same CTL* (oreven CTL) formulas are bisimilar (equivalently: two structures which are not bisimilar can be distinguished by a CTL formula) Generally: M t M' =>  (M) Ap  C C(M') in the figure:  (MX) = C(M2), Mi t M2, M2 t Mi Equivalent definition (game theory): M t M' if any move in M can be matchd by an equally labelled move in M' Formal verification Lectore 7 Marius Minea Formal verification Lectore 7 Marius Minea Formal verification Lectore 7 Marius Minea Comparing models Abstraction Compositional reasoning Comparing models Abstraction Compositional reasoning Comparing models Abstraction Compositional reasoning Genera Пу: M   M' M M'    M' A M in the figure: a M2, M2 a but M2 Equivalent definition (as a game): M   M' if any choice of a model of a move in it can be matched by an equally labelled move in the other model (choice of model done at each step =7 symmetry) Ml — M2 (duplicating nodes does not change branching properties) The relation ^FQ S x S' is a simulation relation between M and M' (with AP' C AP) iff Vs AF - L(s) n AP' = L'(s') - for any path % = ssjs2 in M there exists a fair path 7Г7 = s's'yS^ in M' such that Ѵг > 0 s^ A s 0 s^   s 0 s^   s si л s —> s2 л si 7^ s2 =7 T(si) 7^ L(s2) M,M' deterministic: M A M'  (M) C  (M') in general, we recursively define: s Ao s' b(s) n AP' = L(s') s Ara | i s' 77 s An A Vs1 s —> s] > 3s| s' —> s^ A si An "i We have А^СА* =7 3n An=An | i=A (finite models) M,M' deterministic: M   M'  (M) =  (M') in general, we recursively define: s  o s' b(s) = L(s') s sz 77 s —n sz A Vsi [s —> s i =7 3s^ s' —> s^ A si —Sjj AVsz^[sz —> s^ => 3si s —> si A si —s'J We have  ^ | іС г => 3n -n=-n+i=- (finite models) Formal verification Lectore 7 Marios Minea Abstraction is the key step in verifying systems of realistic size • it means constructing an system (with fewer details) • and establishing a between the abstract and the original system - exact abstractions: preserve truth value - conservative abstractions (approximations): correciness of abstract system implies correciness of real system, but not conversely (counterexample in the abstract system may not exist in the real one) The abstract model must be obtained without building the concrete one (the latter is often impossible due to size) - abstraction techniques - abstraction techniques (e g reduced domain for variables) Formal verification Lectore 7 Marios Minea Timed abstractions (region automaton; zone graph) - are abstractions of an infinite-state systems - several States in the concrete system match a state in the abstract system A is usually an abstraction of the implementation -the tableau for the LTL formula is an abstraction for a system that satisfies it relations (language inclusion, simulation, etc ) between two different systems Using l-bit packets in the protocol model of project 1 (data abstraction) Formal verification Lectore 7 Marius Minea Comparing models Abstraction Compositional reasoning Comparing models Abstraction Compositional reasoning Comparing models Abstraction Compositional reasoning Abstraction by removal of variables that do not affect specification Let M be a system with variable set V = {yi,v2,- • • ,vn} described by the equations = Л(Ѵ) Let V' be the set of variables referenced in the specification The of V' = minimal set С с V such that -V CC - if ѵі e C, and fi depends on vj, then vj с C (transitive closure) We build a new system M' eliminating aii the variables that do not appear in C, together with their functional equations Formal verification Lectore 7 Marius Minea We prove that cone of influence reduction preserves the truth va lues of CTL* specifications (defined over variables from C) Let V = {vi,v2,'   -vn} be a set of boolean variables and M = (S, Sq,R,L), with: - s = {0,1}те = set of assignments to V; Sq c S   R = = - l(s) = {vi|s(t^) = 1} (variables equal to 1 in s) Let V be numbered such that C = {vi, -     ,vk} We define M' = (S',S'0,R',L'): - S' = {0, l}fc = set of assignments to C - Sq = • • • , dfc)|3(di, • • • dn) C Sq CU dy = dl A A d'k = dft} - -r' = aLiW = ma) - L'(s) = {vi s'(vi) = 1} We can show that the concrete model M and the abstract model M' are Formal verification Lectore 7 Marios Minea A similar but more general notion for programs [Weiser’79] - inspired by the mental processes performed during debugging = calculating the program fragment that can affect the computed values in a given point of interest (slicing criterion) (e g variable at source line) - usually: an executable program fragment, in source language - based on program analysis notions of control and data dependence Types of slicing: - static or dinamic - syntactic or semantic criteria - forward or backward traversai of control graph -type of control graph dependence: forward backward; di rect transitive - on a   or some paths through control graph Formal verification Lectore 7 Marius Minea Comparing models Abstraction Compositional reasoning Comparing models Abstraction Compositional reasoning Comparing models Abstraction Compositional reasoning 18 - used for reasoning about circuits with large bit width, or about programs with complex data structures - useful if data Processing operations are relatively simple (transfer, small number of a rit h met ic   logic ops) Main idea: establishing a correspondence between original domain of data and a smaller-size domain (usually a few values) Example: sign abstraction 0 0 0 where T = {—,0,+} => we can not always have a precise abstraction => abstraction domain and function must be carefully chosen Formal verification Lectore 7 Marius Minea - for any variable x, we define an abstract variable x - we labei States with atomic propositions indicating the abstract value (for sign abstraction: 3 propositions p , p  , pt for each variable x, indicating x = " —", x = 0, x = " + ") - we collapse aii States with same abstract labels => abstract state space: 2AP, AP = abstract propositions For an explicity represented model M, we define the abstract (reduced) model Mr = (Sr, St, Rr, Lr): - Sr = {Lr(s) | s e 5} = abstract labelings of States in S - Ss = {s° e Sr | 3"o € S° Lr(so) = (labelings of initial States) - i?r(sr,tr) 3s,t e s R(s,t) A Lr(s) = sr Л Lr(t) = tr (transitions between two abstract States if 3 transitions between concrete repre-sentatives) We can prove: abstract model M' simulates original (concrete) model M Formal verification Lectore 7 Marius Minea 3-state traffic light reduced to 2 States Ь?(СГ) = stop Lr(V) = до relabeling L?(K) = stop collapsing Note: the abstract system may introduce new behaviors (e g , the system can stay in the "stop" state forever) Formal verification Lectore 7 Comparing models Abstraction Ccmnositional reasoning Comparing models Abstraction Compositional reasoning 20 Comparing models Abstraction Compositional reasoning 21 Consider a system represented implicitly, by predicates for the transi-tion relation 77 and the initial States Sq УУе assume the same abstraction function for aii variables, h : D A (D = concrete domain, A = abstract domain) We must define SQ and iZ for the abstract system: Sq = zkci 3xn   Sq(xi, -    , xn) л  г(жі) = Xi A • • • Л h(xn) = xn  Ne similarly define 1Z(xi,-   -x^xi,-    x'n) => from ф(хі,       ,xn) we obtain       ,4) expressed in abstract variables Transforming ф Ф may be a complex operation => we apply it (like negation) just to elementary relations between variables (e g , =, , etc ) Define by structural induction an approximate abstraction A: - A(F(xli ,xn)) = F(si, • • • ,xn), if P is an elementary relation - ,Sn)) = ->F(x'1, • • • ,%) - Аффі Рфоф = Л( Афф) in particula, Sq => A(Sq) and 77=>A(77) (approximation may introduce additional initial States and transitions) Fie modelul abstract aproximat Ma = (Sr, A(50), A(1Z), Lr) Then M У Ma (the abstract approximated model simulates the original) if the abstraction function preserves the relations which corresponds to primitive operations in a program, the abstraction A is exact An abstraction function hx defines an equivalence relation between the concrete values for x which correspond to the same abstract values: di  x d2 hx(di) = hxid-2) if the value of any primitive relation P in the program is the same for any two pair of equivalent concrete values: ydi, Д^=1 di  Xi => P(di,     ,dn) = РфЗф ,    ,d!n} then M   Ma (the abstract model simulates the concrete model) Formal verification Lectore 7 Marius Minea A method for defining the of a program that can be used to analyse the program and produce information about its runtime behavior [Cousot & Cousot '77] Consists in: - a concrete domain D and an abstract domain A, linked via a Galois connection: -an abstraction function а : D A - a concretizat ion function 7 : A P(D) (associates to each abstract state a set of concrete States) - a i Ух E P(D) x C 7(a(x)) si Va E A a = 0(7(0)) (abstraction followed by concretizat ion introduces approximation) concretization followed by abstraction is exact the majority of abstractions can be formulated in this general frame-work Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning Comparing models Abstraction Compositional reasoning Comparing models Abstraction Compositional reasoning For arithmetic circuits programs, the abstraction defined by: h(x) = x mod n, n e Z Preserves primitive mathematical relations, because ((x mod n) + фу mod n)) mod n = (x + y) mod n, etc To verify the datapaths of a system (main function: computing and preserving values) Example: correct transmission from a to b initially, for a fixed value: AG(a = 17 AXb= 17) Abstraction function: h(x) = More generally: we introduce the symbolic parameter c: 1 if x = c 0 otherwise => abstract transition relation R(a,d',b,b',c) in a BDD representation, c does not affect the complexity if the system behavior does not depend on c Example: pipelined adder with two stages AG(repl = a A regQ = b —> AX AX sum = а b) 1 if x = 17 0 otherwise an application of "divide and conquer" to verification of a system built from components - verification of local properties of components - deriving global properties from component properties - without constructing a model of the entire system (impractical) Additionally (chinese remainder theorem): if пі, — п г relatively prime, and n = ni   n2 n^, then X = y (mod n ) " Л(=1x = у (mod ni) => to verify 16-bit arithmetic, it suffices to verify the implementation for integers modulo 5, 7, 9, 11, 32 (product > 216) Compositional reasoning: generic term for rules of the form — Ml |= fi ЛМ2 |= f2 => Compose(Mi, M2 ) |= LogicOp(fi, f2) e g parallel composition, and LogicOp= л — Mi У M2 => CompOptMi') У CompOptMz) ex у = implementation, refinement; СотрОрф) = -||M — Mi У Si Л M2 У S2 => Compose(Mi, M2) У Compose^Si, S2) Formal verification Lecture 7 Marius Minea Formal verification Lecture 7 Marius Minea Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 25 Comparing models Abstraction Compositional reasoning 26 Comparing models Abstraction Compositional reasoning 27 Let M = (S,S0,AP,L,R,F) and M' = (S',S'Q,AP',L',R',F') Define parallel synchronous composition M" = M||M': - S" = {(s, s') e S x S' | L(s) П AP' = П AP} -S"=(SoxS^)nS" - AP" = AP U AP' - L"{s,sf') = L(s') U L'(s' ) - R"((s, s,')(t,t,') ) = R(s,t) Л R,(s,,t'') - F" = {(F x S') nS"| F e F} U {(S x F') n S" | P' e F'} We use ACTL with fairness: for any ACTL formula f we can construct a tableau 7}, and we have M  =F f M PF Tj => we can reason uniformly with formulas and models (tableaux) (a) for any M si M', M||M' sF M (b) for any M, M' si M", M PF M' => M||M" sF M'  M" (o) for any M, M sF M||M Formal verification Lectore 7 Marius Minea Folosim notatia (f}M{g): Orice sistem care satisface prezumtia f si contine M garanteaza g (f, g sunt fie formule, fie modele) O structura tipica de rationament: (true}M(A) л (А)М'(д) Л (g)M(f) => instantiere in termeni concreti: M = un transmitator complex A = un model simplu de transmitator periodic {true)M{A}: M functioneaza la fel ca si A M' = un receptor g = "mesajele sunt preluate la timp" {A}M'{g} = M' compus cu A preia mesajele la timp f = "nu avem buffer overflow" {g}M{f} = daca M e intr-un sistem care preia mesajele la timp, nu avem buffer overflow => in sistemul M||M' nu apare buffer overflow Formal verification Lectore 7 Marius Minea (1) M sFA (2) M||M' sF A  M' (3) A  M'  =Fg (4) A  M' sFT9 (5) M  M'PFTg (6) M||M||M'rfFT5||M (7) T9  M =Ff (8) M||M||M'  =Ff (9) M sFM||M (10) M||M' ^FM||M||M' (11) M  M'  =Ff ipoteza (1) si compozitionalitate (a) ipoteza (3) si prop tabloului ACTL (2) , (4) si tranzitivitatea af (5) si compozitionalitate (b) ipoteza (6) , (7) si tf => |=F compozitionalitate (c) (9) si compozitionalitate (b) (8), (10) si PF => |=F Demonstratoare de teoreme pot mecaniza descompunerea in rationamente pe componente si asigura validitatea deductiei Formal verification Lectore 7 Marius Minea Comparing models Abstraction Compositional reasoning 28 Comparing models Abstraction Compositional reasoning 29 Comparing models Abstraction Compositional reasoning Often, compositional rules are not strong enough Consider implementations Mi and specifications Si, i = 1,2 То prove М1ЦМ2 -x S'iHS'2 it would suffice if Mi - MR |= SR si SR => MQ |= SQ (un modul functioneaza corect in mediul dat de specificarea celuilalt) => Putem deduce de aici ca Mq  Mr |= Sq л Sr ? Studied in various contexts [Chandi & Misra'81, Abadi & Lamport’93] We refer to Reactive Modules [Alur & Henzinger '95]: - modules with input and putput variables, and transition relation - dependence relation RC (V^UV^t) x V^t - x а у', у depends combinationally on x', otherwise, only the next value of у can depend sequentially on x - synchronous parallel composition М1ЦМ2 is possible if Кжі(Мі) n У^(М2) = 0 and RM1 и rM2 is an acyclic relation We define the (implementation) relation M Q2 true at t + 1 - if P2 a Q2 true at 0,1, • • •, t => Qi true at t + orice - then for any t, Fi л F2 => Qi л Q2 Formal verification Lectore 7 Marius Minea Comparing models Abstraction Compositional reasoning [Henzinger’01] - study of the theory of interfaces For a refinement relation two variants: • if Mi program , or may be that data was read correctly of input function (NOT just value read) Avoid when reading and arrays reading when array limit is reached Buffer overflows => system is to Unvalidated input may cause => some of the most (program data) (attacker runs code) errors A badly written program An ignorant programmer than no program(mer) at all! You can only to read data, the caii may not succeed: system: no more data (end-of-file), read error, etc user: data not in needed format (illegal char, not number, etc ) 1 0 functions report both a and an 1 to include error code getcharO : unsigned char converted to int, or EOF (-1) which is different from any unsigned char 2 return type may have a special fgets returns address where the line was read (first argument) or NULL (invalid pointer value) when nothing read 3 return and at given pointer scanf (can be 0, or EOF at end-of-input) takes as arguments where it should place read data (use): getcharO no parameters Returns an converted to or EOF (negative, usually -1) if no char could be read ( , FiLE *stream); puts a character c back into a given input stream (file) for standard input: ungetc(c, stdin); DON’T unget more chars at once (effect not guaranteed); must read between successive calls to ungetc a char: ( ); writes an , converted to to stdout; returns its value, or (constant -1) on error DON’T putchar(EOF) : -1 is converted to 255 (an actual char) Aii input output functions: in stdio h (unless noted) *fgets( *s, , FiLE *stream); Reads up to and including newline  n, max size-1 characters, Stores line in array s, adds ’ 0’ at the end tab ; (fgets(tab, 80, stdin)) { } Third parameter to fgets indicates the from which to read: (stdio h) is (keyboard unless redirected) fgets returns if nothing read (end-of-file) if successful returns address passed as argument (thus non-nu => Test result to find out if read successful S ; (fgets(s, 81, stdin)) printf( , s); A line with > 80 chars will be read and printed piecewise (OK!) More complex: can test if read line was truncated: ; s ; (fgets(s, 81, stdin)) (strlen(s) == 80 && s != && ((c = getcharO) != EOF) { printf( , s); ungetc(c, stdin); } printf( , s); Cil standard => it is function : did not limit size read to use safely ( *s); prints string s followed by newline  n puts( ); ( *s, FiLE *stream); prints string s to given output stream fputs( , stdout); fputs(s, stdout); is like printf ( , s); prints string s as is, without additional newline stdout is (screen unless redirected) puts and fputs return EOF on error, nonnegative on success ( * format, ); functions with variable number of parameters: discussed later First parameter: the ; may contain: usual characters (are printed) and a letter: 7 c char, ° "d, ° "i decimal, %e, %f, %g real, %o octal, %p pointer, 7 s string, ° "u unsigned, %x heXadecimal, %a hex float Remaining parameters: , their are printed their number and type must correspond to format specifiers Result: number of characters printed (usually not used ignored) Example: printf( , 3, sqrt(3)); ( * format, ); First arg: , with format specifiers (some Remaining parameters: where to store read values Need , NOT necessarily (one way to get addresses) DON’T use & for strings: array name iS already its address of objects read (assigned) (NOT their value!) or EOF when error end-of-file reached before anything read Read one integer: (scanf( printf( puts ( , &n) == 1) : like for printf unsigned octal heXadecimal any int format (same in printf) Reading numbers any initial  t  n  v  f  r and space, as checked by isspaceO Like in printf, can combine arbitrary formats (number of objects read successfully) (scanf ( , &x, &y) != 2) { } } Format letter : for reading a (string WiTHOUT whitespace) read a sentence "This is a test " to read a line, use fgets Arrays are ALWAYS limited! => give max length (a constant) one less than array length, scanf will add  0 in scanf => 7, and s word ; (scanf( , word) == 1) printf( , word); scanf with s format initial  t  n  v  f  r and space, as checked by isspaceO Array names , DON’T use & Format reads a (up to whitespace), For repeated processing (while input matches format), write: while ( ) (fgets( )) { } ((c = getcharO) != EOF) { } (scanf( ) == how-many-to-read ) { } On loop exit check: end-of-file? (nothing more), or (format) error (FiLE *stream); returns nonzero if end-of-file reached for given stream if feof (stdin) input is finished else input does not match format => read next char(s) and report , | (! feof (stdin)) use feof in read loop " scanf ( , &n) ; After last good read (number), end-of-input is not yet reached unless nothing more (no whitespace, newline) after it next read will not succeed, but is not checked Simplest: primitive, but from stdlib h ends program Can write an error function that prints a message and calls exit() ( *msg) fputs(msg, stderr); exit(EXiT FAiLURE); We can then use this function for read: (scanf( , &n) != 1) fatal( Good practice: can separate errors from output (using redirection) scanf non-matching input => before trying again > n; printf( ); (scanf( , &m, &n) != 2) { ( ; (c = getcharO) ! = ; ) (c == EOF) exit(l); printf( ); Often, we have to fiii an array up to some stopping condition: read from in put upto a given character (period,  n, etc) сору from another string or array Arrays must not be written beyond their length! ( = 0; i d=5, m=ll, y=2013 see later how to enforce 2 or 4 digits scanf reads until input format Non-matching chars are not read; those variables are not assigned scanf( , &x, &y); input: 123A returns 1; x = 123, y: unchanged; input rest: A scanf( , &x, &y); input: 123A returns 2; x = 123, у = OxA (10) : between [ ] (ranges: with -) Reading stops at first disallowed character a ; (scanf( , a) == 1) max 32 letters and num ; (scanf( , num) == 1) string of digits give max length between % and [ ] Reading a string like above, but use   after [ to specify chars t ; (scanf( , t) == 1) reads up to period or newline Format is E Zi, NOT with s: 7,20 [A-Z] s One character: = getcharO; (c != EOF) { } ; ((c = getcharO) != EOF) { } With scanf (use char, not int; useful for arrays) char c; if (scanf ("° "c", &c) == 1) {  * read OK * } Reading a tab ; scanf ( , tab); reads EXACTLY 80 chars, (including whitespace) DOES NOT add ’ 0’ at end =^- can’t know if all read Check how many read by initializing with zeroes and testing length: (or with ° "n format, see later) tab = ; scanf( , tab); = strlen(tab); formats and consume and ignore initial whitespace two ints separated and possibly preceded by whitespace in formats c [ ] ] whitespace are (not ignored) A in the format consumes any > 0 whitespace in input scanf(" "); consumes whitespace until first non-space char reads char, consumes > 0 whitespace, reads other char is like (whitespace allowed anyway) : space after number consumes ALL whitespace ( newlines!) Consume whitespace, but not newline  n: scanf( ); modifier means consume and ignore (no address is given) То consume and ignore (skip) data with a given format: Use * after % without specifying address where to read => scanf reads according to pattern, but does not store data and does not count in result (number of read objects) Example: text with three grades and average, need just average: (scanf( , &avg) == 1) { Example: consume rest of line scanf( ); (getcharO == EOF) { } Number between % and format character limits count of chars read %4d int, at most 4 chars (initial spaces don’t count, sign does!) scanf( , &m, &n); 12 34 m=12 n=34 scanf( , &m, &n); 12345 m=12 n=34 rest: 5 scanf( , &m, &n); 12 34 m=12 n=34 scanf( , &x); 12 34 x=12 34 scanf( , &m, &n); 123a m=123 n=0xA %d: signed decimal int %i: signed decimal, octal (0) or hexadecimal (Ох, 0X) int %o: octal (base 8) int, optionally preceded by 0 %u: unsigned decimal int ( accepts negative and converts) ° "x, ° "X: hexadecimal int, optionally with Ох, 0X %c: any char, including whitespace 7 s: string of chars, until first whitespace ’ 0’ is added % [•••]: string of indicated allowed characters % ['•••]: string except indicated disallowed chars have a constant unless assignment suppressed with * ° "a, %A, 7"e, %E, %f, %F, %g, %G: real (possibly with exponent) %p: pointer, as printed by printf 7"n: writes into argument (int *) count of chars read so far does not read; does not add to count of read objects (return value) " o0 : percent character ° "d, ° "i: signed decimal int %o: signed octal int, without initial 0 %u: unsigned decimal int ° "x, ° "X: hexazecimal int, without Ox OX; lower upper case %c: character %s: string of characters, up to ’ 0’ or indicated precision %f, %F: real w o exponent; 6 decimal digits; no dot if 0 precision ° "e, ° "E: real with exponent; 6 decimal digits; no dot if 0 precision %g, %G: real, like %e, %E if exp precision; else like %f Does not print zeroes or decimal point if useless ° "a, ° "A: hexadecimal real with decimal 2’s exponent Oxh hhhht>±d %p: pointer, usually in hexadecimal ° "n: writes into argument (int *) count of chars written so far " o0 : percent character Format specifiers may have other optional elements: 7 flag size precision modifier type : *: field is read but not assigned (is ignored) aligns value left for given size +: + before positive number of signed type space: space before positive number of signed type 0: left-filled with 0 up to indicated size (scanf) (printf) (printf) (printf) (printf) hh: argument is char (for diouxXn format) (1 byte) c; scanf ( , &c); in: 123 —> c = 123 (1 byte) h: argument is short (for diouxXn format), e g %hd 1: arg long (format d i ouxXn) or double (fmt aAeEfFgG) n; scanf( , &n); ; scanf( , &x); 11: argument is long long (for diouxXn format) L: argument is long double (for aAeEfFgG format) : an integer scanf: maximal character count read for this argument printf: minimal character count for printing this argument right aligned and filled with spaces, or according to modifiers : only in printf; dot optionally followed by an integer (if only dot, precision is zero) minimal number of digits for (filled with 0) number of decimal digits (for )   significant digits (for ) printf ( , 15 234); | 15 231 2 decimals, 7 total maximal number of chars to print from a string (for ) m = ; printf ( , m); (for string w o ’ 0’) in printf, can have * instead of size and or precision Then, size precision is given by next argument: printf ( , max, s); prints at most max chars Floating point numbers in various formats: printf( printf( printf( printf( printf( printf( printf( printf( , 1 0 1100); , 1 0 1100); , 1 0 11000); , 1-0); , 1-0); , 1-0); , 1 009); , 1 009); Writing integers in table form: printf( printf( printf( , -12); i -121 printf( , 12); i 12i , -12); 1-12 i printf( , -12); i-00012i ,12); i +121 Write 20 characters (printf returns count of written chars) , n, len = printf( , m); printf( , 20-len, n); Two characters separated by a single space (consumed by %*1 [ ]) cl, c2; (scanf( , &cl, &c2) == 2) Read an int with exactly 4 digits: , n2, x; (scanf( , &nl, &x, &n2)==l && n2 - nl == 4) counts read chars; store counters in nl, n2, then subtract Reads checks for a word that must appear: =0; scanf( , &nr); (nr == 7) { } { } ignores up to (and excluding) a given char ( n): scanf( ); Test for the right number of read objects, not just nonzero! (scanf( , &n) == 1), notjust (scanf( , &n)) scanf may also return EOF, which is nonzero! For integers, test overflow using ; (scanf( , &x) == 1)) (errno == ERANGE) { printf( ); errno = 0; } -(scanf ( 7— )) DON’T test for nonzero result it could be > 0 (items read), or -1 (EOF), nothing read! YES: (scanf( , ) == how-many-items-wanted') scanf ( -, buf) The format is [], not  ETs YES: (scanf( , buf) == 1) scanf ( -у—name,—fegrade) The s format reads everything non-whitespace, so it won’t stop at comma YES: (scanf( , name, fegrade) == 2) to read a string with no comma (all else allowed, including whitespace), the comma, and a number 28 octombrie 2003 Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 2 Specification formulas can be converted to automata (LTL tableau construction) - represent "si mp lest" system that conforms to the specification When using an automaton as specification: - what does it mean to say "system functions like this automaton" How does one build (abstract) a simpler model from a complex one ? Does verifying a simpler model ensure correciness of the initial one ? Can one deduce correciness of a composite model from proving properties of the components ? Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 3 Consider a Kripke structure M with a set AP of atomic propositions of M = set of execution traces seen as sequences of labels Formally:  (M) = set of infinite words (strings) oo°i0'2 -such that there exists a path sqs-ls? of M with = оц Language inclusion preserves LTL properties:  (At) c  (5) At |= А  Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 4 Consider two structures M and M', with AP D AP' A relation c S x S' is a relation between M and M' iff Vs У: - L(s) n AP’ = L'(s') (s and s’ labeled identically with respect to AP') - Vsi with s si there exists sx with s' sx and si sx (any successor of s is simulated by a successor of s') The structure M' simulates M (M M') of there exists a simulation relation such that for the initial States: Vsq € Sq 3 sy0 e S'o sq a s'o Prop : The simulation relation is a over the set of structures (reflexive and transitive) We choose: s s s" o 3s' s ^1 s' л s' s2 s" Theorem: if M M', then M' |= f => M |= f, for any ACTL* formula f over AP1 Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 5 Let M and Mr be two structures with APr = AP A relation   C S x Sr is a relation between M and Mf iff Vs,sf with s   sf: - L(s) = L(s’) - V"i with s si there exists with sf and si   - with s' s1 there exists si with s si and si   (or:   a simulation relation between M and Mr be- tween Mf and M) Structures M and Mf are bisimilar if there exists a bisimulation relation   such that for initial States: Vsq G Sq 3sq e Sq sq   s'q, and Vsq G Sq 3sq e Sq Sq   Sg- Prop : The bisimulation relation is an equivalence relation among structures Theorem: if M   Mr then Vf g CTL*, M |= f o Mr |= f Conversely: Two structures that satisfy the same CTL* (or even CTL) formulas are bisimilar (equivalently: two structures which are not bisimilar can be distinguished by a CTL formula) Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 6 Generally: M Mr =>  (М)|ЛР  С  (M') in the figure:  (Mi) =  (ЛІ2), M± M Mr л Mr M in the figure: М2, М2 Ml but Mi М2 Equivalent definition (as a game): M   Mf if any choice of a model of a move in it can be matched by an equally la bel led move in the other model (choice of model done at each step => symmetry) Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 8 M-y — М2 (duplicating nodes does not change branching properties) Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 9 The relation S x S' is a simulation relation between M and M' (with AP' C AP) iff Vs VF s': - L(s) П AP' = L'(s') — for any path 7г = ssis? in M there exists a fair path tt' = s's^s^ in M' such that Vi > 0 Sj V s( if M Af M’, then V  & ACTL*, M' |=F f => M |=F f The relation  Fc S x S' is an bisimulation relation echitabila between M and M' (with AP' = AP) iff Vs  F s': - L(s) = L(s’) - for any path тг = ssis? in M there exists a fair path tt' = s's^s'2 in M1 such that Vz > 0 Sj   s( - for any тг' = s's^s'2 in M' there exists a fair path тг = ss±s2 in Af such that Vz > 0 Sj   sF if M  F M', then V  & CTL*, M' |=F f o M |=F f Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 10 Deterministic system = single initial state; any two successors differ- ently labeled s si л s —>  s2 л si S2 => Ь("1) ¥= L(s2) M,M' deterministic: M M' о Г(М) C  (M') in general, we recursively define: s Aq s' L(s') n AP' = L(s') s ^n i i s' 3s( s' s  A si An s( We have => 3n ^n=^n+i=^ (finite models) M,M' deterministic: M   M' &  (M) =  (M') in general, we recursively define: s  o s' 3s( s' s  Л si  n s(] AVsi[sz —> S| 3si s si A si  n S|] We have  j | iC j => 3n  n=—n+i=— (finite models) Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 11 Abstraction is the key step in verifying systems of realistic size • it means constructing an system (with fewer details) • and establishing a between the abstract and the original system - exact abstractions: preserve truth value - conservative abstractions (approximations): correciness of abstract system implies correciness of real system, but not conversely (counterexample in the abstract system may not exist in the real one) The abstract model must be obtained without building the concrete one (the latter is often impossible due to size) - abstraction techniques - abstraction techniques (e g reduced domain for variables) Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 12 Timed abstractions (region automaton; zone graph) - are abstractions of an infinite-state systems - severai States in the concrete system match a state in the abstract system A is usually an abstraction of the implementation - the tableau for the LTL formula is an abstraction for a system that satisfies it relations (language inclusion, simulation, etc ) two different systems between Using l-bit packets in the protocol model of project 1 (data abstraction) Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 13 Abstraction by removal of variables that do not affect specification Let M be a system with variable set V = {^i, • • in} described by the equations v  =  ХЮ- Let V' be the set of variables referenced in the specification The of V' = minimal set С С V such that - Vr C C - if vi e C, and fi depends on Vj, then Vj e C (transitive closure) We build a new system Mr eliminating all the variables that do not appear in C, together with their functional equations Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 14 We prove that cone of influence reduction preserves the truth values of CTL* specifications (defined over variables from C) Let V = {vi,v2,       vn} be a set of boolean variables and M = (S, Sq, R, L), with: - S = {0, l}n = set of assignments to V; Sq C S - R = Л?=іИ = Л(Ю) - L(s) = {t>j|s(t;j) = 1} (variables equal to 1 in s) Let V be numbered such that C = {t?i,       ,vk} We define M' = - S' = {0, l}fe = set of assignments to C - Sq = {(dx, •   •, dp|3(di,     • dn) & Sq cu dx = di A A d'k = dk} - R' = Л*=іИ = fi j) = 1} We can show that the concrete model M and the abstract model Mf are Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 15 A similar but more general notion for programs [Weiser’79] - inspired by the mental processes performed during debugging = calculating the program fragment that can affect the computed values in a given point of interest (slicing criterion) (e g variable at source line) - usually: an executable program fragment, in source language - based on program analysis notions of control and data dependence Types of slicing: - static or dinamic - syntactic or semantic criteria - forward or backward traversai of control graph -type of control graph dependence: forward backward; direct transitive - on all or some paths through control graph Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 16 - used for reasoning about circuits with large bit width, or about programs with complex data structures - useful if data Processing operations are relatively simple (transfer, small number of arithmetic   logic ops) Main idea: establishing a correspondence between original domain of data and a smaller-size domain (usually a few values) Example: sign abstraction where T = 0, +} => we can not always have a precise abstraction => abstraction domain and function must be carefully chosen Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 17 - for any variable x, we define an abstract variable x - we labei States with atomic propositions indicating the abstract value (for sign abstraction: 3 propositions p , p%, Px for each variable x, indicating x = " , x = 0, x = " + " ) - we collapse all States with same abstract labels => abstract state space: 2AP, AP = abstract propositions For an explicity represented model M, we define the abstract (reduced) model Mr = (Sr,S% ,Rr,Lr)' - Sr = {Lr(s) | s e S} = abstract labelings of States in S - = {"o e Sr | 3sq e S° Lr(sQ) = (labelings of initial States) - Rr(sr,tr) 3s,t e S R(s,t) A Lr(s) = sr A Lr(t) = tr (transitions between two abstract States if 3 transitions between concrete repre-sentatives) We can prove: abstract model M' simulates original (concrete) model M Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 18 3-state traffic light reduced to 2 States L,( i) = stop Note: the abstract system may introduce new behaviors (e g , the system can stay in the "stop" state forever) Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 19 Consider a system represented implicitly, by predicates for the transi-tion relation Ti and the initial States 5q We assume the same abstraction function for all variables, h : D A (D = concrete domain, A = abstract domain) We must define Sq and P for the abstract system: "Sq = 3 ! Эхп   So("l,       ,xn) Л h( i) = A •   • A = Xn We similarly define 7? ( i,       xn,xi',       xn') => from ( i,       , n) we obtain s( i,       , a^) expressed in abstract variables Transforming ф ф may be a complex operation => we apply it (like negation) just to elementary relations between variables (e g , =, , etc ) Define by structural induction an approximate abstraction Л: - Л(Р( і, , n)) = P(xi,       , afn), if P is an elementary relation - Л(-'Р( і,   , n)) = -'^( i,      , Xn) - А(Ф1 л ф2) = Л( і) л А(ф2) - А(Ф1 v ф2) = А(Ф1) ѵ А(ф2) - А(Эхф) = Эх А(ф) — А(ф хф) = V  А(ф) Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 20 With the definitions so far, one can prove:   ф ф => Л(ф) in particula, 5q => Л("sо) and => Л(Т^) (approximation may introduce additional initial States and transitions) Fie modelul abstract aproximat Ma = (5>,Л(5о),Л(7^), Lr) Then M Ma (the abstract approximated model simulates the original) if the abstraction function preserves the relations which corresponds to primitive operations in a program, the abstraction A is exact An abstraction function hx defines an equivalence relation between the concrete values for x which correspond to the same abstract values: di d2 O ^(di) = dx(d2) if the value of any primitive relation P in the program is the same for any two pair of equivalent concrete values: Vdi, -dn, d'r, • "dfn AF=i di ^Xi d F(db • • • , dn) = P(dfv • • • , d^) then M   Ma (the abstract model simulates the concrete model) Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 21 A method for defining the of a program that can be used to analyse the program and produce information about its runtime behavior [Cousot & Cousot ’77] Consists in: - a concrete domain D and an abstract domain A, linked via a Galois connection: - an abstraction function а : D A - a concretization function 7 : A P(Z?) (associates to each abstract state a set of concrete States) - a i   x G x С 7(а(ж)) si Va E A а = 0(7 (a)) (abstraction followed by concretization introduces approximation) concretization followed by abstraction is exact the majority of abstractions can be formulated in this general frame-work Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 22 For arithmetic circuits programs, the abstraction defined by: h(x) = x mod n, n G Z Preserves primitive mathematical relations, because (( t mod n) + (y mod n)) mod n = (ж + у) mod n, etc Additionally (chinese remainder theorem): if relatively prime, and n = n± • П2 • • • • • n^, then x = у (mod n) o A^=i x = У (mod n^) => to verify 16-bit arithmetic, it suffices to verify the implementation for integers modulo 5, 7, 9, 11, 32 (product > 216) Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 23 To verify the datapaths of a system (main function: computing and preserving values) Example: correct transmission from a to b initially, for a fixed value: AG(a = 17 AX6 = 17) fl if x = 17 Abstraction function: h(x) = { v 7 0 otherwise More generally: we introduce the symbolic parameter c: if x = c otherwise => abstract transition relation R(a, a',b, bf, c) in a BDD representation, c does not affect the complexity if the system behavior does not depend on c Example: pipelined adder with two stages AG(regl = a A reg2 = b AX AX sura = a b) Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 24 an application of "divide and conquer" to verification of a system built from components - verification of local properties of components - deriving global properties from component properties - without constructing a model of the entire system (impractical) Compositional reasoning: generic term for rules of the form " 1=  1 A M2 |=  2 => Compose(Mi, M2) |= LogicOp(fi, f2) e g parallel composition, and LogicOp = Л — Mi M2 => CornpOp(Mi') CompOp^M^ ex = implementation, refinement; CompOpQ) = -  M — Mi Si Л M2 S2^ Compose^Mi, M2) Compose(jSi, S2) Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 25 Let M = (S, Sq, AP, L, R, F) and M' = (S1, S'o, AP', L', R', F'} Define parallel synchronous composition M" = M  M': - s" = {(s, s') e S x S' i L(s) П AP' = L'(s'} П AP} - Sg = (So X Sg) П S" - AP" = AP U AP' - L"(s,s') = L(s) UL'is1) - R"^s,s'Xt,t')) = R(s,t) AR s',t'} - F" = {(F x s') n S" i P e F} U {(S x F') n s" | P' e F'} We use ACTL with fairness: for any ACTL formula f we can construct a tableau Tf, and we have M |=p f M sF Tf => we can reason uniformly with formulas and models (tableaux) (a) for any M si M', M  M' AF M (b) for any M, M' si M", M AF M' M  M" ^F (c) for any M, M sF M  M Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 26 Folosim notatia (f)M(g): Orice sistem care satisface prezumtia f si contine M garanteaza g (f, g sunt fie formule, fie modele) O structura tipica de rationament: (true)M(A) Л (A)Mf(g) Л (g)M(f) => (true)M  M'(f) instantiere in termeni concreti: M = un transmitator complex A = un model simplu de transmitator periodic (true)M(A): M functioneaza la fel ca si A Mf = un receptor g = "mesajele sunt preluate la timp" {A}Mf{g} = Mf compus cu A preia mesajele la timp f = "nu avem buffer overflow" (g)M(f) = daca M e intr-un sistem care preia mesajele la timp, nu avem buffer overflow => in sistemul M  Mf nu apare buffer overflow Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 27 (1) MAF A (2) M  M' Af A  M' (3) A  M' =Fg (4) A  M' AFTg (5) M  M' AFTg (6) M  M  Mf AF Tg  M (7) Tg  M  =Ff (8) МЦМЦМ'  =F f (9) M ^F M  M (10) m  m' af m||m||m' (11) M  M'  =Ff ipoteza (1) si compozitionalitate (a) ipoteza (3) si prop tabloului ACTL (2) , (4) si tranzitivitatea AF (5) si compozitionalitate (b) ipoteza (6) , (7) si af => |=F compozitionalitate (c) (9) si compozitionalitate (b) (8), (10) si (in : n,d,q', oui : r) actualizeaza restul: rf = (r—q^dd)^b- -next digit{rd) Dorim ca Mq  Mr sa satisfaca impreuna urmatorii invarianti: • Sq  0 Putem deduce de aici ca Mq  Mr |= Sq л Sr ? Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 30 Studied in various contexts [Chandi & Misra’81, Abadi & Lamport’93] We refer to Reactive Modules [Alur & Henzinger ’95]: - modules with input and putput variables, and transition relation - dependence relation - Q2 true at t + 1 - if P2 A Q2 true at 0,1, • • •, t => Qi true at t + orice - then for any t, л -Рг => Q1 A Q2 Formal verification Lecture 7 Marius Minea Comparing models Abstraction Compositional reasoning 32 [Henzinger’Ol] - study of the theory of interfaces For a refinement relation two variants: • if Mi program , or may be that data was read correctly of input function (NOT just value read) Avoid when reading and arrays reading when array limit is reached Buffer overflows => system is to Unvalidated input may cause => some of the most (program data) (attacker runs code) errors A badly written program An ignorant programmer than no program(mer) at all! You can only to read data, the caii may not succeed: system: no more data (end-of-file), read error, etc user: data not in needed format (illegal char, not number, etc ) 1 0 functions report both a and an 1 to include error code getcharO : unsigned char converted to int, or EOF (-1) which is different from any unsigned char 2 return type may have a special fgets returns address where the line was read (first argument) or NULL (invalid pointer value) when nothing read 3 return and at given pointer scanf (can be 0, or EOF at end-of-input) takes as arguments where it should place read data (use): getcharO no parameters Returns an converted to or EOF (negative, usually -1) if no char could be read ( , FiLE *stream); puts a character c back into a given input stream (file) for standard input: ungetc(c, stdin); DON’T unget more chars at once (effect not guaranteed); must read between successive calls to ungetc a char: ( ); writes an , converted to to stdout; returns its value, or (constant -1) on error DON’T putchar(EOF) : -1 is converted to 255 (an actual char) Aii input output functions: in stdio h (unless noted) *fgets( *s, , FiLE *stream); Reads up to and including newline  n, max size-1 characters, Stores line in array s, adds ’ 0’ at the end tab ; (fgets(tab, 80, stdin)) { } Third parameter to fgets indicates the from which to read: (stdio h) is (keyboard unless redirected) fgets returns if nothing read (end-of-file) if successful returns address passed as argument (thus non-nu => Test result to find out if read successful S ; (fgets(s, 81, stdin)) printf( , s); A line with > 80 chars will be read and printed piecewise (OK!) More complex: can test if read line was truncated: ; s ; (fgets(s, 81, stdin)) (strlen(s) == 80 && s != && ((c = getcharO) != EOF) { printf( , s); ungetc(c, stdin); } printf( , s); Cil standard => it is function : did not limit size read to use safely ( *s); prints string s followed by newline  n puts( ); ( *s, FiLE *stream); prints string s to given output stream fputs( , stdout); fputs(s, stdout); is like printf ( , s); prints string s as is, without additional newline stdout is (screen unless redirected) puts and fputs return EOF on error, nonnegative on success ( * format, ); functions with variable number of parameters: discussed later First parameter: the ; may contain: usual characters (are printed) and a letter: 7 c char, ° "d, ° "i decimal, %e, %f, %g real, %o octal, %p pointer, 7 s string, ° "u unsigned, %x heXadecimal Remaining parameters: , their are printed their number and type must correspond to format specifiers Result: number of characters printed (usually not used ignored) Example: printf( , 3, sqrt(3)); ( * format, ); First arg: , with format specifiers (some Remaining parameters: where to store read values Need , NOT necessarily (one way to get addresses) DON’T use & for strings: array name iS already its address of objects read (assigned) (NOT their value!) or EOF when error end-of-file reached before anything read Read one integer: (scanf( printf( puts ( , &n) == 1) : like for printf unsigned octal heXadecimal any int format (same in printf) Reading numbers any initial Like in printf, can combine arbitrary formats (number of objects read successfully) (scanf( , &x, &y) != 2) { { } Format letter : for reading a (string WiTHOUT whitespace) read a sentence "This is a test " to read a line, use fgets Arrays are ALWAYS limited! => give max length (a constant) % and s one less than array length, scanf will add  0 —: scanf ( -Leads to word ; (scanf( , word) == 1) printf( , word); scanf with s format initial  t  n  v  f  r and space, as checked by isspaceO Array names , DON’T use & Format reads a (up to whitespace), For repeated processing (while input matches format), write: while ( ) (fgets( )) { } ((c = getcharO) != EOF) { } (scanf( ) == how-many-to-read ) { } On loop exit check: end-of-file? (nothing more), or (format) error (FiLE *stream); returns nonzero if end-of-file reached for given stream if feof(stdin) input is finished else input does not match format => read next char(s) and report , | (ifeof(stdin)) use feof in read loop " scanf ( , &n) ; After last good read (number), end-of-input is not yet reached unless nothing more (no whitespace, newline) after it next read will not succeed, but is not checked Simplest: primitive, but from stdlib h ends program Can write an error function that prints a message and calls exit() ( *msg) fputs(msg, stderr); exit(EXiT FAiLURE); We can then use this function for read: (scanf( , &n) != 1) fatal( Good practice: can separate errors from output (using redirection) scanf non-matching input => before trying again > n; printf( ); (scanf( , &m, &n) != 2) { ( ; (c = getcharO) ! = ; ) (c == EOF) exit(l); printf( ); Often, we have to fiii an array up to some stopping condition: read from input upto a given character (period,  n, etc) сору from another string or array Arrays must not be written beyond their length! ( = 0; i d=5, m=ll, y=2013 see later how to enforce 2 or 4 digits scanf reads until input format Non-matching chars are not read; those variables are not assigned scanf( , &x, &y); input: 123A returns 1; x = 123, y: unchanged; input rest: A scanf( , &x, &y); input: 123A returns 2; x = 123, у = OxA (10) : between [ ] (ranges: with -) Reading stops at first disallowed character a ; (scanf( , a) == 1) max 32 letters and num ; (scanf( , num) == 1) string of digits give max length between % and [ ] Reading a string like above, but use   after [ to specify chars t ; (scanf( , t) == 1) reads up to period or newline Format is E Zi, NOT with s: 7,20 [A-Z] s One character: = getcharO; (c != EOF) { } ; ((c = getcharO) != EOF) { } With scanf (use char, not int; useful for arrays) char c; if (scanf ("° "c", &c) == 1) {  * read OK * } Reading a tab ; scanf( , tab); reads EXACTLY 80 chars, (including whitespace) DOES NOT add ’ 0’ at end =^- can’t know if all read Check how many read by initializing with zeroes and testing length: (or with ° "n format, see later) tab = ; scanf( , tab); = strlen(tab); formats and consume and ignore initial whitespace two ints separated and possibly preceded by whitespace in formats c [ ] ] whitespace are (not ignored) A in the format consumes any > 0 whitespace in input scanf(" "); consumes whitespace until first non-space char reads char, consumes > 0 whitespace, reads other char is like (whitespace allowed anyway) : space after number consumes ALL whitespace ( newlines!) Consume whitespace, but not newline  n: scanf( ); modifier means consume and ignore (no address is given) То consume and ignore (skip) data with a given format: Use * after % without specifying address where to read => scanf reads according to pattern, but does not store data and does not count in result (number of read objects) Example: text with three grades and average, need just average: (scanf( , &avg) == 1) { Example: consume rest of line scanf( ); (getcharO == EOF) { } Number between % and format character limits count of chars read %4d int, at most 4 chars (initial spaces don’t count, sign does!) scanf( , &m, &n); 12 34 m=12 n=34 scanf( , &m, &n); 12345 m=12 n=34 rest: 5 scanf( , &m, &n); 12 34 m=12 n=34 scanf( , &x); 12 34 x=12 34 scanf( , &m, &n); 123a m=123 n=0xA %d: signed decimal int %i: signed decimal, octal (0) or hexadecimal (Ох, 0X) int %o: octal (base 8) int, optionally preceded by 0 %u: unsigned decimal int ( accepts negative and converts) ° "x, ° "X: hexadecimal int, optionally with Ох, 0X %c: any char, including whitespace 7 s: string of chars, until first whitespace ’ 0’ is added % [•••]: string of indicated allowed characters % ['•••]: string except indicated disallowed chars have a constant unless assignment suppressed with * ° "a, %A, 7"e, %E, %f, %F, %g, %G: real (possibly with exponent) %p: pointer, as printed by printf 7"n: writes into argument (int *) count of chars read so far does not read; does not add to count of read objects (return value) " o0 : percent character ° "d, ° "i: signed decimal int %o: signed octal int, without initial 0 %u: unsigned decimal int ° "x, ° "X: hexazecimal int, without Ox OX; lower upper case %c: character %s: string of characters, up to ’ 0’ or indicated precision %f, %F: real w o exponent; 6 decimal digits; no dot if 0 precision ° "e, ° "E: real with exponent; 6 decimal digits; no dot if 0 precision %g, %G: real, like %e, %E if exp precision; else like %f Does not print zeroes or decimal point if useless ° "a, ° "A: hexadecimal real with decimal 2’s exponent Oxh hhhht>±d %p: pointer, usually in hexadecimal ° "n: writes into argument (int *) count of chars written so far " o0 : percent character Format specifiers may have other optional elements: 7 flag size precision modifier type : *: field is read but not assigned (is ignored) aligns value left for given size +: + before positive number of signed type space: space before positive number of signed type 0: left-filled with 0 up to indicated size (scanf) (printf) (printf) (printf) (printf) hh: argument is char (for diouxXn format) (1 byte) c; scanf ( , &c); in: 123 —> c = 123 (1 byte) h: argument is short (for diouxXn format), e g %hd 1: arg long (format d i ouxXn) or double (fmt aAeEfFgG) n; scanf( , &n); ; scanf( , &x); 11: argument is long long (for diouxXn format) L: argument is long double (for aAeEfFgG format) : an integer scanf: maximal character count read for this argument printf: minimal character count for printing this argument right aligned and filled with spaces, or according to modifiers : only in printf; dot optionally followed by an integer (if only dot, precision is zero) minimal number of digits for (filled with 0) number of decimal digits (for )   significant digits (for ) printf ( , 15 234); | 15 231 2 decimals, 7 total maximal number of chars to print from a string (for ) m = ; printf ( , m); (for string w o ’ 0’) in printf, can have * instead of size and or precision Then, size precision is given by next argument: printf ( , max, s); prints at most max chars Floating point numbers in various formats: printf( printf( printf( printf( printf( printf( printf( printf( , 1 0 1100); , 1 0 1100); , 1 0 11000); , 1-0); , 1-0); , 1-0); , 1 009); , 1 009); Writing integers in table form: printf( printf( printf( , -12); i -121 printf( , 12); i 12i , -12); 1-12 i printf( , -12); i-00012i ,12); i +121 Write 20 characters (printf returns count of written chars) , n, len = printf( , m); printf( , 20-len, n); Two characters separated by a single space (consumed by %*1 [ ]) cl, c2; (scanf( , &cl, &c2) == 2) Read an int with exactly 4 digits: , n2, x; (scanf( , &nl, &x, &n2)==l && n2 - nl == 4) counts read chars; store counters in nl, n2, then subtract Reads checks for a word that must appear: =0; scanf( , &nr); (nr == 7) { } { } ignores up to (and excluding) a given char ( n): scanf( ); Test for the right number of read objects, not just nonzero! (scanf( , &n) == 1), notjust (scanf( , &n)) scanf may also return EOF, which is nonzero! For integers, test overflow using ; (scanf( , &x) == 1)) (errno == ERANGE) { printf( ); errno = 0; } -(scanf ( 7— )) DON’T test for nonzero result it could be > 0 (items read), or -1 (EOF), nothing read! YES: (scanf( , ) == how-many-items-wanted') scanf ( -, buf) The format is [], not  ETs YES: (scanf( , buf) == 1) scanf ( -у—name,—fegrade) The s format reads everything non-whitespace, so it won’t stop at comma YES: (scanf( , name, fegrade) == 2) to read a string with no comma (all else allowed, including whitespace), the comma, and a number Marius Minea marius@cs upt ro 22 November 2017 oO Phrack 49 Oo Volume Severi^ issue Forty-Nine File 14 of 16 BugTraq  rOOt, and Underground Org bring you XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Smashing The Stack For Fun And Profit XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX by Aleph One alephl@underground org 'Mayhem" Declared Preliminary Winner of Historic Cyber Grand Challenge OUTREACH@DARPA MiL 8 4 201 6 Capping an intensive three-year push to spark a revolution in automated cyber defense, DARPA today announced that a computer system designed by a team of Pittsburgh-based researchers is the presumptive winner of the Agency’s , the world’s first all-hacking tournament Then: intuition, creativity, a debugger Now: debugger not enough lots of : constraint   satisfiability checking of instruction semantics (specialized platforms) intelligent of different techniques skills for performance lightweight technique, evolves inputs aims for input variety, high statement branch coverage more expensive, analyzes program control flow attempts path coverage find path to bug, then synthesize exploit Fuzzing fuzzer by Michal Zalewski, http:  lcamtuf coredump cx afl  active development, scores of bugs found in key software atteriean fuzzy lop o 47b (readpng) process tiaing run time : 0 days, 0 hrs, 4 nrin, 43 sec last new path 0 days, 0 hrs, 0 min, 26 sec last uniq crash : none seen yet last uniq hang : 0 days, 0 hrs, 1 min, cycle progress p now processing : 38 (19 49%) paths ti med out : O (0 00%) В -> C -> D -> E and A -> В -> D -> С -> E have different transition pairs (C, D) and (D, C) transition coverage provides more info than basic block coverage also self-loops A->A (tight program loops) can’t record exhaustively =^- do some hashing for compression cur location = ; shared mem[cur location   prev location]++; prev location = cur location >> 1; AFL keeps 64kB map of branch pairs try to remove blocks of data from input, check if coverage kept Sequential bit flips: flip 1-4 bits, stepping one bit at a time yield: 70 (single flip) downto 10 new paths per million expensive (one execve() for each bit of input) Sequential byte flips (1-4 bytes) Simple arithmetic: incr decr integer values (small inc, ±35) Known integers: can trigger edge conditions in typical code (-1, 256, 1024, MAX iNT, etc) : stacked bit flips, insertions, deletions take two inputs differing in > 2 places, splice at some midpoint, then do nondeterministic tweaks usually ±20% of execution paths То fuzz structured input, can start with dictionary of keywords Even random keyword combinations yield interesting valid SQL select sum(l)LiMiT(select sum(l)LiMiT -1,1); select round( -1) +] Read 38 bytes from "test" Per for mi ng dry г un (mem li mit = 25 MB, timeout = 1000 ms) *] Analyzing input file Cthis may take a while) - no-op block superficial content - criticai stream - "magic value" section - suspected length field suspected cksum or magic int - suspected checksummed block 000000] h e 1 1 о EMC с r u e 1 EMC w о г 1 > '00001b' d FKF g o o d b у e EMC cruci EMC 000032] worid Г+] Analysis cotlete interesting bits: 15 79X of the input file [+] We're done here Have a nice day! [lcamtuf@raccoon afl]s | "No-op blocks": no apparent control flow change (data payload) "Superficial content": some control flow changes (strings in rich documents) "Criticai stream": control flow altered in correlated ways (keywords, magic values, compressed data) "Suspected length field" - small int causing control flow change "Suspected cksum or magic int" "Suspected checksummed block" "Magic value section" For libraries, usual fuzzing approach is with a simple client program but: overhead for execveQ, linker, library initialization routines idea: modify binary to stop after all initialization, before main code On command from fuzzer, fork() clone of already-loaded program fast due to copy-on-write Symbolic execution described since mid-seventies (James C King 1976, others) program is executed by a special interpreter, using inputs => results in symbolic execution tree each branch: as formula over symbolic variables tree traversai stops when path condition becomes Can be used to: attaining high coverage or try to reach a specific branch Successful mature technique, hundreds of papers, many tools: Java Pathfinder, (j)CUTE, Crest, KLEE, Pex, SAGE, for C C++, C#, Java, more recently JavaScript Symbolic Execution Example int a = a, b = p, c = y;   symbolic int x = 0, у = 0, z = 0; if (a) { x = -2; } if (b 0) Assume: x = 20; у = hash(20) = 13 reach To reach , negate x + у > 0, with concrete у (у = 13) Solver might return, e g , x = -15 if lucky, -15 + hash(-15) worst-case: degrades to random testing Cadar, Dunbar, Engler KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs, OSDi 2008 (best paper) 90%+ coverage on coreutils + busybox 56 serious bugs in 430 kloc, some bugs 15 years old simple crash inputs generated for several programs based on LLVM infrastructure (analyzes LLVM bitcode) lots of engineering work path exploration heuristics efficient branching due to copy-on-write models for library functions, file system, etc SAGE: Whitebox Fuzzing for Security Testing Ella Bounimova Patrice Godefroid impact: since 2007 - 500+ machine years (in largest fuzzing lab in the world) - 3 4 ВІІІІОП+ constraints (largest SMT solver usage ever!) - 100s of apps, 100s of bugs (missed by everything else ) - Ex: of Win7 WEX security bugs found by SAGE -> - Bug fixes shipped quietly (no MSRCs) to 1 Billion+ PCs - Millions of dollars saved (for Microsoft and the world) - SAGE is now used daily in Windows, Office, etc David Molnar How bugs were found (Win7 WEX Security) Regression + AN others SAGE Random testing From vulnerabilities to exploits Easy to find functions which are unsafe For cases which are be unsafe, must decide 1) is it really a bug ? 2) can it be exploited ? S Heelan, Automatic generation of control flow hijacking exploits for software vulnerabilities, MSc thesis, Oxford, 2009 Two steps: generate input that executes and exploitable program path express conditions necessary to transfer control to shellcode Avgerinos, Brumley et al : Automatic Exploit Generation, NDSS 2011 Unleashing Mayhem on Binary Code, iEEE S&P 2012 applied large-scale to Debian code can generate buffer overflow and format string attacks (form constraints on symbolic instruction pointer   format string) checking Debian for bugs 37,000 programs 16 billion verification queries test cases 2,606,000 crashes 14,000 unique bugs 152 exploits * [ARCB, iCSE 2014, ACM Distinguished Paper], [ACRSWB, CACM 2014] " slide: David Brumley, CMU, 2015 Driller: Augmenting Fuzzing Through Selective Symbolic Execution Stephens, Kruegel, Vigna et al (UC Santa Barbara), NDSS 2016 Key insight: fuzzing is cheap, good overall coverage Symbolic execution: expensive, path explosion, but can pass through precise, complex condition 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 int check(char *x, int depth) { if (depth >= 100) { return 0; } else { int count = (*x == ’B’) ? 1 : 0; count += check(x + l, depth + 1); return count; } } int main(void) { char x ; read(0, x, 100); if (check(x, 0) == 25) vulnerable () ; } Listing 4 A program that causes a path explosion under concolic execution Fig 10 Graph visualizing the progresa made by DriHer in discovering new compartments Each node is a function; each edge is a function caii, but return edggs are excluded to maintain legibility Node "A" is the entry point Node "B" contains a magic number check that requires the symbolic execution component to resolve Node "C" contains another magic number check Fuzzing helps explore a "compartment" efficiently Symbolic execution finds "door" between compartments o Compartment 1 O Compartment 2 Q Compartment 3 Ф Compartment 4 Fig 11 The sequence of compartments throu^i which execution fiows for a trace of the crashing input for CGC application 2b03cf01 DriHer's ability to "break into" the fourth compartment (represented by the black nodes) was criticai for generating the crashing input The generated, derandomized crashing input was "A x00 x00 x00 x00 x00 x00 x00 x9c6 x00 x00 xl8 x04 x00 x00 xl8' x00 x00A x00 x00 x00 x00 x00 x00 x00 x9c6 x00 x00 xl9 x04 x00  x00 xl4 x00 x00 x00A x00 xf8 xff xff xec x00d x96X x0c x00 x06 x08 x00 x00 xl0 x00 x00 x00A x00 x00 x00 x00 x00 x00 xfb x96X  x0c x00 x02 x08 x00 x00 xl8' x00 x00A x00 xebA x00 x00d x96X x0c x00 x06" The full exploit specification, conforming to the DARPA CGC exploit specification format and accounting for randomness, is available in Appendix A G Case Study Time (seconds) Fig 9 For the binary 2b03cf01, which Driller crashed in about 2 25 hours, this graph shows the number of basic blocks found over time Each line represents a different number of invocations of symbolic execution from zero to three invocations After each invocation of symbolic execution, the fuzzer is able to tind more basic blocks Method Crashes Found Fuzzing 68 Fuzzing A Driller 68 Fuzzing П Symbolic 13 Symbolic 16 Symbolic П Driller 16 Driller 77 Fascinating work, rapid reaction, spectacular advance strong reliance on theory logic (advances in SMT solvers) open-source platforms (angr, BAP, BiNSEC, etc ) engineering, performance, integration of techniques More good reads: Yan Shoshitaishvili, Ruoyu Wang, Ch Kruegel, G Vigna et al : (State of) The Art of War: Offensive Techniques in Binary Analysis, iEEE S&P 2016 describes http:  angr io  platform from UCSB Program semantics, analysis and verification Program semantics, analysis and verification Program semantics analysis and verification Formal verification Lecture 8 December 8, 2005 Marius Minea [Nielsen & Nielsen, Semantics with Applications, Wiley 1992, 1999] Semantics = describing the meaning (behavior of programs) formally: express the meaning in terms of a mathematical model semantics: describes the computation is executed (effects of a statement on the program state) - natural (big-step) operational semantics: overall execution - structural (small-step) operational semantics: effect composed from individual statements semantics: describes of program construct typically as function; not how execution is done - direct style: meaning of standalone construct - continuation style: meaning if followed by a given continuation semantics: assertions about effect of executing the program (can focus on some properties of interest) - partial correciness' what is true if program terminates - total correciness' also expresses when program terminates Formal verification Lecture 8 Marius Minea - first practicai successes of formal verification were for hardware - but started by formalizing programming language semantics " an adequate basis for formal definitions of the meanings of programs [ ] in such a way that a rigorous standard is established for proofs" "if the initial values of the program variables satisfy the relation R±, the final values on completion will satisfy the relation R2 " - method: annotating a program (or flow graph) with assertions - introduces the notion of : a formula VC(P,Q) such that if P is true before executing c, then Q is true upon exit and for a program + initial condition - very general approach: assertions expressed in first order logic - develops general rules for combining verification conditions and specific rules for different instruction types - explicitly introduces for reasoning about loops - handles using a positive decreasing measure Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification Program semantics, analysis and verification Program semantics, analysis and verification - like Floyd, handles preconditions and postconditions for executing an instruction, but the notion of Hoare triple better displays the relation between the statement and the two assertions - works with source programs, not flow graphs - Notation: {P} S {Q} if S is executed in a state that satisfies P and it terminates, the resulting state satisfies Q - Later: similar reasoning for [F] S [Q] if S is executed in a state that satisfies F, then it terminates and the resulting state satisfies Q Rigorous application: C A R Hoare Proof of a Program: FiND (1971) - defined for each statement type individually by combining them, we can reason about entire programs {Qh B]} x := E {Q} where Q[x E] is the substitution of x with E in Q Example: {x = у - 2} x : = x + 2 {x = y} (in x = y, substitute x with the assigned expression, s + 2 and obtain x - -2 = y, thus s = - 2) Writing the rule "backward'' (F as function of Q) simplifies it {-O gi {Q} {Q} s2 {Д} {P} si; s2 {P} {fAJjtSHQ} {fAVi}S>K} {P} if E then Si else S2 {Q} : is key in reasoning about programs - must find an i = a property which stays true before after each loop iteration - if loop is entered (F), the invariant is maintained after loop body S - if loop not entered (->F), imvariant implies postcondition Q {i   E} S {1} i   -E ^>Q {1} while E do S {Q} while (lo m)  + both cases maintain lo m => n >= m+1 => n >= lo *  else hi = m;  * !(n n n lo==n && n==hi *  Formal verification Lecture 8 Marius Minea Formal verification Lecture 8 Marius Minea Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification Program semantics, analysis and verification Program semantics, analysis and verification Consider {F} * x = 2 {-y + *x = 4} What is the precondition P ? Correct answer: v = 2 Vx = & v But using the simple rule (y + *x = 4)[*ж 2] misses second case => we must model memory m = memory, a = address, d = data Consider functions rd(m,a) return d and wr(m,a,d') return m! We have the rule: rd(wr(m, ai ,d'),a2) = '! 4 if a2 — ai v   } rd(m,a2) if a2 7^ ai We must deduce а property of memory m from the relation: rd(wr(rn,x, 2), &t>) + rd(wr(m,x, 2), ж) = 4 rd(wr(m,x,2'),&' v') + 2 = 4 rd(wr(rn,x,2'),&' v') = 2 x = &Lv Л 2 = 2 V ж + &cv Л rd(m, &Lv) = 2 x = &Lv У ѵ = 2 E W Dijkstra Guarded Commands, Nondeterminacy and Formal Derivation of Programs (1975) - for a given statement S and postcondition Q there can be seve ral preconditions P such that {P} S {Q} or [F] S [Q] - Dijkstra calculates a precondition wp(5jQ) for successful termination of S with postcondition Q - necessary (weakest): if [F] S [Q] then F => wp(S,Q) - wp is a predicate transformer (transforms post- into precondition) - allows the definition of a calculus with such transformers Assignment: wp(x := E,Q) = Q x E  (see Hoare's rule) Sequencing: wp(5i; S2,Q) = wp(Si,wp(S2,Qy) Conditional: wp(if E then 5'i else S2,Q) = {E wp(5'i,Q)) A (-,  =y wp{S2,Q}) For iteration, we need a recursive corn putation Define assuming loop terminates in at most к iterations: tupo(while E do S,Q) = -E => Q (loop not entered) W C | i (while Fdo S, Q)) = (F=>wp(5,wp^ (while Fdo S,Q))) Л (->F=>Q) ( 1 iteration followed by can be written as a fixpoint formula Formal verification Lecture 8 Marius Minea Formal verification Lecture 8 Marius Minea Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification Program semantics, analysis and verification 11 Program semantics, analysis and verification We know F before the loop, we wish to find Q after execution How do we establish an invariant i of the loop in order to prove Q? i must satisfy the following conditions: - F => Z (i sufficiently weak to hold initially) - {i   E} S {1} (i is an invariant) -iA -E =? Q (i sufficiently strong to be useful) Determining loop invariants is difficult: Trivial example: {ж 10, but not as useful) Usually: iterative calculation (fixpoint); sometimes needs invariant strengthening Formal verification Lecture 8 Marius Minea Using Floyd-Hoare-style reasoning, we can express properties as predicates over the state variables of the program - same as e g , atomic prepositions were defined in Spin - sample predicates: ж > 0, lock = 1, ж+ 1 We need a (possibly approximate) method for backward forward explorat ion in the abstract state space Formal verification Lecture 8 Marius Minea General framework: - symbolic approach, with state sets represented as formulas post(r, t) = {"' | 3s e r s Л s'}: successor of region (state set) r - we seek the abstract operator poster, t) = a(postc(7(r), t)) - in general, this computation is infeasible expensive in practice (particularly the abstractisation operation a) => abstractions with different kinds of approximation Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification Program semantics, analysis and verification 14 Program semantics, analysis and verification -each predicate represented in disjunctive normal form (as disjunctions of monomials ф) monomial = conjunction (product) of predicates pi or their negation -'Pi - successor postai ),t) for the transition (statement) t also approxi-mated by a monomial => we determine for each predicate if the monomial contains pi or (or none) => we determine for each predicate pi if posta(yp,t') im pi ies pi or ->pj, i e , if ф => гор(р^,і) or ф =s гор(-ір^,і) - Approximating with monomials is highly restrictive => can lead to very coarse approximations - However, more precise computations can lead to exponential number of calls to decision procedures => infeasible - split region ф recursively in fragments that can lead to States that satisfy pi, or -прг, respectively = postiwhere Postk(J>, t) = pk  postk+1^  prec^pk),t), t)V^pk^postk+1(s^prec(-y(pk), t), for 1 corn pute once for each statement its effect on predicate pi => produce an abstract boolean program in which every statement has as effect the assignment of every predicate with a new value: - true, for predicate combinations that imply w(7(?i)T) - false, for predicate combinations that imply wp(7(->pj),t) - unknown, otherwise - also called abstraction (independent for each predicate) Formal verification Lecture 8 Marius Minea Formal verification Lecture 8 Marius Minea Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification Program semantics analysis and verification Program semantics analysis and verification Example: SLAM project [Microsoft Research] (Software (Specifications), Languages, Analysis and Model checking) Goal: verification of safety properties (invariants) example: a program observes ARi usage rules (such as: calls to lockO and unlockO alternate -focused mainly on detecting interface errors - applied to device drivers for Windows NT ХР Characteristics: - needs no user annotation of program (only specifying rules as automata monitoring correct behavior) - automated counterexample-guided abstraction refinement Formal verification Lecture 8 Marius Minea - abstract model is program control flow graph augmented with cho-sen set of boolean predicates over program variables (initial set of predicates may be empty) - this finite representation is model checked to find violation of specification - if the model is correct, the program is correct (conservative abstraction) - if a counterexample is found, it is explored symbolically in the concrete program, retaining (cor)relations among variables - if the counterexample is feasible, an error has been found - counterexample may be infeasible, if the conjunction of the condi-tions needed to traverse the required branches is unsatisfiable (false) => counterexample due to coarse abstraction - unsatisfiable core of formula suggests predicates to refine abstraction - procedure is repeated with new (augmented) set of predicates This is a semialgorithnr, termination is not guaranteed Formal verification Lecture 8 Marius Minea do |  * fragment of device driver, [Ball & Rajamani ’Ol] *  request = devExt->WriteListHeadVa; if(request kk request->status) { devExt->WriteListHeadVa = request->Next; irp->ioStatus Status = STATUS-SUCCESS; irp->ioStatus information = request->Status; } else { irp-> io Status Status = STATUS UNSUCCESSFUL; irp->ioStatus information = request->Status; } SmartDevFreeBlock(request); ioCompleteRequest(irp, i0 N0 iNCREMENT); Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification 19 Program semantics, analysis and verification 20 Program semantics, analysis and verification state { enum { Unlocked=0, Locked=l } state = Unlocked; } KeAcquireSpinLock return { if (state == Locked) abort; else state = Locked; } KeReleaseSpinLock return { if (state == Unlocked) abort; else state = Unlocked; } - start from the predicates in the specification - use nondeterministic if where truth value unknown - remove irrelevant instructions (skip) do { if O) { B: if (*) { } else { Specification translated into C; original program is instrumented (original program correct instrumented program cannot reach error) Formal verification Lecture 8 Marius Minea Formal verification Lecture 8 Marius Minea Bebop: calculates reached States for every statement of boolean program, using an interprocedural dataflow analyis algorithm state = assignment to variables in scope set of States = boolean function, represented as BDD computation with sets of States: captures correlation between variables - does not expand procedures, exploits locality of variables - uses an explicit control flow graph - complexity: linear in size of CFG; exponential in number of vars in scope For the given example: model checkerfindsthat A: KeAcquireSpinLockO could be called twice successively (an error) Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification Program semantics, analysis and verification Program semantics, analysis and verification A theorem prover is used to check if the counterexample in the abstract program is really a counterexample in concrete program Evaluates program statements using symbolic constants until it finds that the assignment at the end of the path is feasible, or finds an inconsistency along the way For an inconsistency, a minimal unsatisfiable formula isfound and the corresponding predicates are generated ІП the example nPacketsOld = nPackets and nPacketsOld != nPackets decision procedures are incomplete => might return "don’t know" -the boolean program is then regenerated  * choose(pl, p2) pl ? T : p2 ? F : nondet *  В:  * b == (nPackets == nPacketsOld) *  - at present: programs of about lOkloc and tens hundreds boolean variables can be analyzed in (tens of) minutes - with optimisations, lOOkloc may be reached Available verifiers for C: BLAST (UC Berkeley), MAGiC (CMU) Optimisation: lazy abstraction [Henzinger, Jhala, Majumdar, Sutre ’02] - does not refine abstraction at each iteration - current abstraction is refined with new predicates only in code frag-ments where this is necessary (on-the-fly) => preserves locality (e g , different abstractionsforthen else branches) The second, refined abstraction is sufficient to prove correciness Formal verification Lecture 8 Marius Minea Formal verification Lecture 8 Marius Minea Formal verification Lecture 8 Marius Minea November 13, 2017 locks semaphores (binary, counting) monitors conditional criticai regions Process calculi (process algebras): algebraic notation for describing 5 their sequential and parallel and over channels Communicating Sequential Processes (Hoare, 1978) alphabet of actions: ау = {inlp, zn2p, small, large, outlp} V = (zn2p {large small outlp v) inlp small V) or formally, as least fixpoint (p) of above equation system V = pX {in2p {large X small outlp X) inlp small -+X) , nondeterministic choice, interleaving, synchronization on deterministic choice an event, hiding events (e g synchronization) Calculus of Communicating Systems (Milner, 1980) later тг-calculus, allowing communicating channel names => mobility based on Hoare’s Conditional Criticai Regions public int get O { atomic (items != 0) { items —; return buffer[items]; What’s missing: what is the data protected ? when is a blocked thread released ? dynamically non-conflicting executions can operate concurrently CCR conditions re-evaluated only on a shared update non-blocking implementation (prevents deadlock, priority inversion) minimal restrictions for code enclosed in atomic low implementation overhead outside CCRs void STMStartO void STMAbortO boolean STMCommit() boolean STMValidate() void STMWaitO Clojure: dynamic language (Lisp dialect) compiled to Java bytecode Refs allow shared use of mutable storage locations mutation of location allowed only in transaction values are immutable including composite ones is actually a function that returns a new value old value still exists and can be used To change state: construct new compound value change the reference => can be done much easier Everything is an actor Actors may send messages to other actors create new actors (a finite number) designate behavior for next message received Similar to Smalltalk (send messages) process algebras Examples in Oz [Wikipedia] - Programs wait until variables bound to values thread Z = X+Y % waits until both X and Y are bound {Browse Z} % shows the value of Z end thread X = 40 end thread Y = 2 end - immutable values (cannot change while bound) [after vanRoy and Haridi] out (T) adds tuple т to the tuple space in(T) reads and removes tuple (based on pattern matching) rd(T) reads nondistructively eval creates a new process evaluating a tuple (used for iPC) can be implemented with a lock, a dictionary and a concurrent queue init() { out("head", 0); out("tail", 0); put(elem) { in("tail", ?tail); out("elem", tail, elem); out("tail", tail+1); take(elem) { in("head", ?head); out("head", head+1); in("elem", head, ?elem); http:  www lindaspac :es сот teachingmaterial LindaTutorial Jan2006 pdf 15 November 2017 (loop without useful progress) : inequitable resource access (threads that do not get access, though no deadlock overall) in particular, data races simple source statement (++) may not be atomic in binary code variables covering several memory words (non-atomic writes) Concurrent programs have synchronization primitives but how are they implemented ? e g with hardware support: test and set instruction (test and set(lock) == 1); more general: compare-and-swap ( * , , ) { = *x; (current == old) *x = new; current; } (1) { Li: flag[O] = true; L2: turn = 1; L3: (flagfl] && turn==l) СО: flag[O] = false; (1) { R1: flagfl] = true; R2: turn = 0; R3: (flag && turn==0) Ci: flag[l] = false; Designed for single-processor shared memory Not safe in a multicore setting (relaxed memory consistency) Happen when two threads access a variable, and at least one does a write access the threads are not explicitly synchronized Analyzing race conditions is complicated by (through compiler optimizations) init: x = 0; у = 0; Possible outcomes (rl, r2): (0, 0) tl: rl = x; t2: r2 = у; (1, 0) У = 2; x = 1; (0, 2) But by reordering in tl and t2 we could obtain rl = 1, r2 = 2 1 This result does not match (that we are intuitively used to) all memory accesses correspond to (linear), and order of accesses in any thread is Understanding concurrency problems is often hard Difficult to exercise a certain execution sequence needs control over changes to scheduler external conditions Error traces might be very rare (in certain complex scenarios) Error conditions may be hard to reproduce ("Heisenbugs") Exhaustive exploration of all execution traces is infeasible quad (exponential in number ofthreads   their size) [Farchi, Nir, Ur: Concurrent bug patterns and how to test them, 2003] x = 0 || x = 0x101 =^- x == 1 possible!! if the bytes are written separately (hi from 0, low from 0x101) even if accesses protected, object may change in between lockO; idx = table find(key); unlockO ; if ( ) { lock(); table [idx] = newval; unlockO; } (e g programmer unfamiliar with code) tl: synchronized(ol) {n++;} t2: n++;    notsync or tl: synchronized(ol) {n++;} t2: synchronized(o2) {n++;} : "optimizing" on-demand initialization Foo { Helper helper = ; Helper getHelperO { (helper == ) ( ) { (helper == ) helper = Helper(); helper; Problem: compiler is free to reorder for optimization (but which may happen): sleepO wrongly used to guarantee a delay : when executed before wait: tl: synchronized(o) { o waitO; } || t2: synchronized(o) { o notifyAllO ; } : on resume, must check awaited condition (resume might have happened due to other causes) code written assuming the criticai section won’t block false, if (bad) code provided by someone else "orphan" threads if creator thread terminates with error may lead to deadlock A concurrent language must have a memory model that is and which does , by restricting optimizations Solution [JSR 133; Manson, Pugh, Adve, PLDi’05]: define a class of programs ( ) for which is ensured minimal guarantees for other programs (not well-synchronized) Principie: define a order [Lamport] between program actions: of a) ordering of synchronization actions (b w any unlock and lock on same monitor, and b w write and read on a volatile variable) b) program order (between execution threads) Reading a volatile variable: last value written in synchronization order Reading a non-volatile variable: any value which is not written later according to happens-before and is not obsoleted by another write Reading a volatile variable: last value written in synchronization order Reading a non-volatile variable: any value which is not written later according to happens-before and is not obsoleted by another write Warning: does NOT mean i Race condition = conflicting accesses (r-w, w-w) not ordered by happens-before Well-synchronized program = does not have race conditions implicitly, JUnit observes thread that launched the test => does not detect exceptions in threads launched later need frameworks with features adapted to concurrency Various jUnit additions, e g ConcJUnit [Rice University] creates observers a group of execution threads warns if other threads still running after main thread completes (should have been handled with a join ) may insert arbitrary delays => generates other interleavings RunnerScheduler (experimental APi addition) idea: create variation in thread scheduling ConTest [iBM Haifa] instruments program (sleepO, yieldO, etc ) or simulates delays, message loss, etc random or guided variation in scheduling measures coverage with respect to all possible schedules interleavings CHESS [Microsoft Research] captures calls to synchronization functions systematically generates executions with new schedules in increasing order of preemption count can reproduce generated executions Many proposed Solutions Widely used algorithm: Eraser combines static and dynamic analysis by analyzing execution finds errors in others keeps track of locks acquired by each thread tries to derive which lock protects which shared object init: C(v) = all-locks;    for each variable v access: C(v) = C(v) П locks-heldtt);    on access by t if (C(v) = 0) warning();    unprotected access! if extended, may distinguish read and write locks, tracking the state of each variable (virgin, exclusive, shared, shared-modified) Conservative algorithm, may give false alarms for correct programs (which do not associate a variable with a unique lock throughout) [Artho, Havelund, Biere 2003] Errors: when granularity of protected variables not same over time void swapO { int ix, ly; synchronized(this) { ix = this x; ly = this y; synchronized(this) { this x = ly; this y = ix; void resetO { synchronized(this) { this x = 0; synchronized(this) { this y = 0; Member access synchronized, but swap and reset may interfere! => Need analysis not just for variables (what locks protect them?) but also starting from locks (what variable sets covered by each?) Completely explores program executions simulates nondeterminism through a custom virtual machine which allows choosing scheduling variants at each step and returning to unexplored ones (similar to backtracking) Works at bytecode level; allows to check deadlocks exceptional conditions assertions in code Limited to smaller programs (10 kloc): "state space explosion" size of stored States (number of program variables) number of possible traces (exponential in number of threads) Lu, Park, Seo, Zhou: Learning from Mistakes - A Comprehensive Study on Real World Concurrency Bug Characteristics, ASPLOS’08 Research Questions: what kinds of real bugs can be detected? are assumptions valid ? e g focus on single-variable access how helpful are tools in diagnosing and fixing ? 105 randomly selected real world concurrency bugs 74 non-deadlock bugs + 31 deadlock bugs 4 large open-source programs: Apache, Mozilla, MySQL, OpenOffice 97% two patterns: atomicity or order violation latter not well addressed by tools 97% two threads, circular wait 96% reproducible w  partial order between 2 threads 92% order between suggests 66% involved only one variable 22% caused by one thread acquiring resource held by itself 73% of non-deadlock bugs fixed by adding locks 61% fix: prevent thread to aquire a lock; may cause other bugs Transactional memory could avoid 39% of bugs + 42% more by addressing some concerns (l O, atomic GC) R Xin et al , An Automation-assisted Empirical Study on Lock Usage for Concurrent Programs, iCSM 2013 4 programs: Aget, Apache httpd, MySQL, Pbzip2, up to 786Kloc issues to study: (language) characteristics of lock usage (function lock counts) lock usage patterns lock usage evolution 80% of the lock related functions acquire only one lock simple lock patterns account for 55% of all lock usage only 12 out of 527 detected patterns are conditional (more error-prone) only 0 65% of functions are lock related Wojkicki &г Strooper, A State-of-Practice Questionnaire on Verification and Validation for Concurrent Programs, PADTAD’06 35 survey respondents, Java development Relevant defects: deadlock, interference (> 80%), starvation (50%) Techniques: code inspection, jUnit test (> 80%) static analyis (50%, mostly FindBugs), code coverage, model checking (20%) Kester, Mwebesa, Bradbury (SCAM 2010): How Good is Static Analysis at Finding Concurrency Bugs? used 12 benchmarks from Java PathFinder and iBM ConTest evaluated 3 tools: FindBugs, JLint, Chord recall: 30-33 % of actual known bugs precision: 100% (Chord), 78% (JLint), 31% (FindBugs) Threat to validity: small-scale evaluation (13 bugs) Sadowski &г Yi How Developers Use Data Race Detection Tools SPLASH PLATEAU’14 Two data race analysis mechanisms: ThreadSafety and TSan ThreadSafety: static, annotation-based, implemented in Clang led to 18 bug-fixing commits (1 month) in small section of code TSan (ThreadSanitizer) : dynamic identification of data races TSan vl - Valgrind, 20-300x slowdown TSan v2 - LLVM, happens-before, 5-15x slowdown, TSan in 30 min found Chrome bug hunted for 6 months Team A: ThreadSafety for does, nightly runs of TSan find 1 race per 10 weeks Team B: added annotations to all core libraries ensures annotation for all mutexes (automatically searched) Team C: stable synch code, no payoff for ThreadSafety, not heard of TSan Team D: ThreadSafety for tricky code, not heard of TSan Reproducibility &г low false positives are important Team culture matters Tradeoff: races vs deadlocks (crash is easy, inconsistency is hard) Manual inspection is implicit comparison point Good does important for building mental models Limitations: slow speed and lack of coverage (TSan), difficulty of annotation (ThreadSafety) Program semantics, analysis and verification December 8 Formal verification Lecture 8 1 2005 Marius Minea Program semantics, analysis and verification 2 [Nielsen & Nielsen, Semantics with Applications, Wiley 1992, 1999] Semantics = describing the meaning (behavior of programs) formally: express the meaning in terms of a mathematical model semantics: describes the computation is executed (effects of a statement on the program state) - natural (big-step) operational semantics: overall execution - structural (small-step) operational semantics: effect composed from individual statements semantics: describes of program construct typically as function; not how execution is done - direct style: meaning of standalone construct - continuation style: meaning if followed by a given continuation semantics: assertions about effect of executing the program (can focus on some properties of interest) - partial correciness: what is true if program terminates - total correciness: also expresses when program terminates Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification 3 - first practicai successes of formal verification were for hardware - but started by formalizing programming language semantics "an adequate basis for formal definitions of the meanings of programs [ ] in such a way that a rigorous standard is established for proofs" "if the initial values of the program variables satisfy the relation Rlf the final values on completion will satisfy the relation R2'' - method: annotating a program (or flow graph) with assertions - introduces the notion of : a formula VC(P  Q) such that if P is true before executing c, then Q is true upon exit and for a program + initial condition - very general approach: assertions expressed in first order logic - develops general rules for combining verification conditions and specific rules for different instruction types - explicitly introduces for reasoning about loops - handles using a positive decreasing measure Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification 4 - like Floyd, handles preconditions and postconditions for executing an instruction, but the notion of Hoare triple better displays the relation between the statement and the two assertions - works with source programs, not flow graphs - Notation: {P} S {Q} if S is executed in a state that satisfies P and it terminates, the resulting state satisfies Q - Later: similar reasoning for [F] S [Q] if S is executed in a state that satisfies F, then it terminates and the resulting state satisfies Q Rigorous application: C A R Hoare Proof of a Program: FiND (1971) Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification 5 - defined for each statement type individually by combining them, we can reason about entire programs {Q x E } {Q} where Q[x E] is the substitution of x with E in Q Example: {x = у — 2} x := x + 2 {x = y} (in x = y, substitute x with the assigned expression, x + 2 and obtain x + 2 = y, thus x = у - 2) Writing the rule "backward" (P as function of Q) simplifies it {P} Si {Q} {Q} S2 {R} {P}S1;S2{R} {P A E} S! {Q} {P Л ->E} S2 {Q} {P} if E then Si else S2 {Q} Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification 6 : is key in reasoning about programs - must find an i = a property which stays true before after each loop iteration - if loop is entered (iT), the invariant is maintained after loop body S - if loop not entered (-lE), imvariant implies postcondition Q {i A E} S {1} i A^E^Q {1} while E do S {Q} while (lo m)  * both cases maintain lo m => n >= m+1 => n >= lo *  else hi = m;  * !(n n n lo==n && n==hi *  Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification 7 Consider {P} * x = 2 {t? + = 4} What is the precondition P ? Correct answer: v = 2   x = & v But using the simple rule (v + *x = 4)[*ж 2] misses second case => we must model memory m = memory, a = address, d = data Consider functions rd(m, a) return d and wr(m,a,d) return mf We have the rule: rd(wr(m, ai, d  "2) = 1 ^7    0 2 "7 ai v v [ rd(rn,a2) if wp(S, Q) - wp is a predicate transformer (transforms post- into precondition) - allows the definition of a calculus with such transformers Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification 9 Assignment: wp(x := E Q} = Q[x E  (see Hoare’s rule) Sequencing: wp(Si; S?, Q) = Q)) Conditional: wp(if E then S± else S?, Q) = (E => wp(Si, Q)) Л (->E => wp(S2, Q)) For iteration, we need a recursive computation Define wpfa assuming loop terminates in at most к iterations: wpo(while E do S,Q) = ^E => Q (loop not entered) wp c p1(while!TdoS', Q)) = (E^wp(S, wp c(while!TdoS', Q))) Л (^E^Q) ( i {i sufficiently weak to hold initially) - {i л E} S {1} {i is an invariant) - i Л => Q (i sufficiently strong to be useful) Determining loop invariants is difficult: Trivial example: {x 10, but not as useful) Usually: iterative calculation (fixpoint); sometimes needs invariant strengthening Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification 11 Using Floyd-Hoare-style reasoning, we can express properties as predicates over the state variables of the program - same as e g , atomic prepositions were defined in Spin - sample predicates: x > 0, lock = 1, x + 1 We need a (possibly approximate) method for backward forward exploration in the abstract state space Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification 12 General framework: - symbolic approach, with state sets represented as formulas poster, t) = {У | e r s  }: successor of region (state set) r - we seek the abstract operator poster = а(ро5^(у(г  t)) - in general, this computation is infeasible expensive in practice (particularly the abstractisation operation o) => abstractions with different kinds of approximation Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification 13 - each predicate represented in disjunctive normal form (as disjunctions of monomials ф) monomial = conjunction (product) of predicates p^ or their negation ^Pi - successor post^QiKt) for the transition (statement) t also approxi-mated by a monomial => we determine for each predicate if the monomial contains pj or (or none) => we determine for each predicate pi if post^QiKt) implies pi or -7^, i e , if ф => wp(p^,t) or ф => wp(-ip^t) Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification 14 - Approximating with monomials is highly restrictive => can lead to very coarse approximations - However, more precise computations can lead to exponential number of calls to decision procedures => infeasible - split region ф recursively in fragments that can lead to States that satisfy Pi, or ^pi, respectively posF^t) = posti where postk((j>, t) = рклро5ік+1(флргеР^(рк),і),і)  ^рклро5ік+1(фл^ргеР^(рк),і), for 1 compute once for each statement its effect on predicate pj => produce an abstract boolean program in which every statement has as effect the assignment of every predicate with a new value: - true, for predicate combinations that imply wp(7(pj,t) - false, for predicate combinations that imply wp(7(->pj,t) - unknown, otherwise - also called abstraction (independent for each predicate) Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification 16 Example: SLAM project [Microsoft Research] (Software (Specifications), Languages, Analysis and Model checking) Goal: verification of safety properties (invariants) example: a program observes ARi usage rules (such as: calls to lockO and unlockO alternate - focused mainly on detecting interface errors - applied to device drivers for Windows NT ХР Characteristics: - needs no user annotation of program (only specifying rules as automata monitoring correct behavior) - automated counterexample-guided abstraction refinement Formal verification Lecture 8 Marius Minea Program semantics analysis and verification 17 - abstract model is program control flow graph augmented with cho-sen set of boolean predicates over program variables (initial set of predicates may be empty) - this finite representation is model checked to find violation of specification - if the model is correct, the program is correct (conservative abstraction) - if a counterexample is found, it is explored symbolically in the concrete program, retaining (cor)relations among variables - if the counterexample is feasible, an error has been found - counterexample may be infeasible, if the conjunction of the conditions needed to traverse the required branches is unsatisfiable (false) => counterexample due to coarse abstraction - unsatisfiable core of formula suggests predicates to refine abstraction - procedure is repeated with new (augmented) set of predicates This is a semialgorithm; termination is not guaranteed Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification 18 do {  * fragment of device driver, [Ball & Rajamani ’01] *  request = devExt->WriteListHeadVa; if(request && request->status) { devExt->WriteListHeadVa = request->Next; irp = request->irp; if (request->status > 0) { irp->ioStatus Status = STATUS SUCCESS; irp->ioStatus information = request->Status; } else { irp->ioStatus Status = STATUS UNSUCCESSFUL; irp->ioStatus information = request->Status; } SmartDevFreeBlock(request); loCompleteRequest(irp, i0 N0 iNCREMENT) ; } } Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification 19 state { enum { Unlocked=0, Locked=l } state = Unlocked; } KeAcquireSpinLock return { KeReleaseSpinLock return { if (state == Locked) abort; if (state == Unlocked) abort; else state = Locked; else state = Unlocked; } } Specification translated into C; original program is instrumented (original program correct instrumented program cannot reach error) Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification 20 - start from the predicates in the specification - use nondeterministic if where truth value unknown - remove irrelevant instructions (skip) do { A: skip; if (*) { B: if (*) { skip; } else { skip; } } } C: Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification 21 Bebop: calculates reached States for every statement of boolean program, using an interprocedural dataflow analyis algorithm state = assignment to variables in scope set of States = boolean function, represented as BDD computation with sets of States: captures correlation between variables - does not expand procedures, exploits locality of variables - uses an explicit control flow graph - complexity: linear in size of CFG; exponential in number of vars in scope For the given example: model checker finds that A: KeAcquireSpinLockO could be called twice successively (an error) Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification 22 A theorem prover is used to check if the counterexample in the abstract program is really a counterexample in concrete program Evaluates program statements using symbolic constants until it finds that the assignment at the end of the path is feasible, or finds an inconsistency along the way For an inconsistency, a minimal unsatisfiable formula is found and the corresponding predicates are generated in the example nPacketsOld = nPackets and nPacketsOld != nPackets decision procedures are incomplete => might return "don’t know" - the boolean program is then regenerated Formal verification Lecture 8 Marius Minea Program semantics, analysis and verification do { A:  * b == (nPackets == nPacketsOld) * if (*) { B: if (*) { skip; } else { skip; }  * choose(pl, p2) == } } The second, refined abstraction is Formal verification Lecture 8 23 pl ? T : p2 ? F : nondet *  sufficient to prove correciness Marius Minea Program semantics, analysis and verification 24 - at present: programs of about lOkloc and tens hundreds boolean variables can be analyzed in (tens of) minutes - with optimisations, lOOkloc may be reached Available verifiers for C: BLAST (UC Berkeley), MAGiC (CMU) Optimisation: lazy abstraction [Henzinger, Jhala, Majumdar, Sutre ’02] - does not refine abstraction at each iteration - current abstraction is refined with new predicates only in code frag-ments where this is necessary (on-the-fly) => preserves locality (e g , different abstractions for then else branches) Formal verification Lecture 8 Marius Minea Marius Minea marius@cs upt ro 14 November 2016 Any (variable x, array element, structure field) of type has an of type where its value is stored , it is a Valid addresses are non-null indicates an invalid address is ( *)0 i e , 0 cast to type * An address is a numeric value, but not of type int or unsigned it may be printed with format specifier in printf For low-level systems programming: Types intptr t and uintptr t (from stdint h) are the right size to hold a * We need to know how to 1 a variabile of pointer (address) type 2 a pointer (address) value 3 a pointer (address) value To use pointers correctly, need to (like for all variables values): 1 be aware of their 2 them correctly 3 use the right   functions type *ptrvar; => the variable ptrvar may contain the address of a value of type Examples: *s; * ; When declaring several pointers, need for ofthem: int *p, *q; two integer pointers int *p, q; one pointer p and one integer q From (a pointer): int tab , *a = tab; same as: int tab ; int *a; a = tab; Declaring tab ; array name tab has type Taking of a variable: int n, *p = &n; same as: int n; int *p; p = &n; A is a pointer to the contents (to first char): char *s = "test"; same as: char *s; s = "test"; The operator prefix operator gives the abject located at address operand: pointer (address); result: (variable) indicated by pointer is an (can be assigned, like a variable) can also be used in an expression, like any value of that type says is type of and is type of The operator is the of *&x is the object at the address of x, that is, x &*p is the address of the object at address p, that is, p , у, *p = &x; у = *p; *P = y; x has type => &x has type p has type => *p has type Any variable has an address =^- pointer variables have addresses Any expression has a type: The address of a variable of type has type The address of a variable of type has type etc Having declared => we can declare * ; the type of &p is ** ** and initialize assign it with &p T * p; may be read: Variable Value Address T* p; p has type T * = 5; 5 0x408 7" *p; *p has type T * =&x; 0x408 ОхБІС **s; address of char addr *t ; array of 8 char addr ** =&p; ОхБІС 0x9D0 A with is NOT an 1 The in a declaration is an indirection operator! is written next to the declared variable, but belongs to the Declaration * ; suggests that *p is an but the variable declared is p, NOT *p (*p is not an identifier) so the initializer is for p, NOT for *p = { 3, 5 }; initializes t WRONG: t —  { 3, Б }; , *p = &x; is like ; * ; p = &x; (p is initialized assigned, NOT *p) *p - &x is a type error! *p = ; is *p; p = ; WRONG: *p = "str"; Programs can’t have just pointers These must point to something (useful data: need variables to store it in) Declaring ; means i want to have an integer What for? What value does it have? => Better: = a ; Declaring * *p; only means i want to use the address of a char Need: *p = buf; p points to array buf ; declared before *p = ; p points to a *p = strchr(buf, ); returned by function, could be NULL it’s an to use int sum;—for (i 0;—i++ tmp = *pa; *pa = *pb; *pb = tmp; = 3, у = 5; swap(&x, &y); We use to pass to return e g min (can’t pass array in C) (return allows only one) max of an array; result error code When passing an array to a function, the in tab[LEN] ; the tab has type Variants of printf scanf with strings as source destination int sprintf(char *s, const char *format, ); int sscanf(const char *s, const char *format, ); sprintf has =^- may overflow buffer Use instead: ( *str, size t , *format, writing is limited to size chars including  0 safe option ; s[] = ; (sscanf(s, , &n) == 1) (but we don't know where processing of string stopped) ( *nptr, **endptr, ); assigns to *endptr the address of first unprocessed char *end; n = strtol(s, &end, 10); base 10 or other also for , for base 10 = atoi(s); returns 0 on error, but also for use only when string known to be good command line: with (options, files, etc ) Examples: gcc -Wall prog c or is directory or cp filei file2 main can access command line if declared with 2 args ( these): number of in command line (arguments + 1) *argv[] array of argument addresses (strings) ( , *argv □) { printf( , argv ); (argc == 1) puts( ( = 1; i = 1 argv[] array ends with a NULL element, argv [argc] from program: ( returns -1 if can’t run, or exit code of program *cmdline) November 14, 2016 Language Support for Concurrency • locks • semaphores (binary, counting) • monitors • conditional criticai regions Programming Language Design and Analysis Lecture 10 2 Marius Minea Language Support for Concurrency 3 based on Hoare’s Conditional Criticai Regions public int get O { atomic (items != 0) { items —; return buffer[items]; What’s missing: what is the data protected ? when is a blocked thread released ? Programming Language Design and Analysis Lecture 10 Marius Minea Language Support for Concurrency 4 dynamically non-conflicting executions can operate concurrently CCR conditions re-evaluated only on a shared update non-blocking implementation (prevents deadlock, priority inversion) minimal restrictions for code enclosed in atomic low implementation overhead outside CCRs Programming Language Design and Analysis Lecture 10 Marius Minea Language Support for Concurrency void STMStartO void STMAbortO boolean STMCommit() boolean STMValidate() void STMWaitO Programming Language Design and Analysis Lecture 10 5 Marius Minea Language Support for Concurrency 6 Clojure: dynamic language (Lisp dialect) compiled to Java bytecode Refs allow shared use of mutable storage locations mutation of location allowed only in transaction Programming Language Design and Analysis Lecture 10 Marius Minea Language Support for Concurrency 7 values are immutable including composite ones is actually a function that returns a new value old value still exists and can be used To change state: construct new compound value change the reference => can be done much easier Programming Language Design and Analysis Lecture 10 Marius Minea Language Support for Concurrency 8 Everything is an actor Actors may send messages to other actors create new actors (a finite number) designate behavior for next message received Similar to Smalltalk (send messages) process algebras Programming Language Design and Analysis Lecture 10 Marius Minea Language Support for Concurrency 9 Examples in Oz [Wikipedia] - Programs wait until variables bound to values thread Z = X+Y % waits until both X and Y are bound {Browse Z} % shows the value of Z end thread X = 40 end thread Y = 2 end - immutable values (cannot change while bound) Programming Language Design and Analysis Lecture 10 Marius Minea Language Support for Concurrency 10 [after vanRoy and Haridi] out (T) adds tuple T to the tuple space in(T) reads and removes tuple (based on pattern matching) rd(T) reads nondistructively eval creates a new process evaluating a tuple (used for iPC) can be implemented with a lock, a dictionary and a concurrent queue Programming Language Design and Analysis Lecture 10 Marius Minea Language Support for Concurrency init() { out("head", 0); out("tail", 0); put(elem) { in("tail", ?tail); out("elem", tail, elem); out("tail", tail+1); take(elem) { in("head", ?head); out("head", head+1); in("elem", head, ?elem); http:  www lindaspac Programming Language Design and Analysis Lecture 10 11 :es сот teachingmaterial LindaTutorial Jan2006 pdf Marius Minea 16 November 2016 (loop without useful progress) : inequitable resource access (threads that do not get access, though no deadlock overall) in particular, data races simple source code statements (++) may not be atomic in machine code variables covering several memory words (non-atomic writes) Concurrent programs have synchronization primitives but how are they implemented ? e g with hardware support: test and set instruction (test and set(lock) == 1); more general: compare-and-swap ( * , , ) { = *x; (current == old) *x = new; current; } (1) { Li: flag[O] = true; L2: turn = 1; L3: (flagfl] && turn==l) СО: flag[O] = false; (1) { R1: flagfl] = true; R2: turn = 0; R3: (flag && turn==0) Ci: flag[l] = false; Designed for single-processor shared memory Not safe in a multicore setting (will discuss) Happen when two threads access a variable, and at least one does a write access the threads are not explicitly synchronized Analyzing race conditions is complicated by (through compiler optimizations) init: x = 0; у = 0; Possible outcomes (rl, r2): (0, 0) tl: rl = x; t2: r2 = у; (1, 0) У = 2; x = 1; (0, 2) But by reordering in tl and t2 we could obtain rl = 1, r2 = 2 1 This result does not match (that we are intuitively used to) all memory accesses correspond to (linear), and order of accesses in any thread is Understanding concurrency problems is often hard Difficult to exercise a certain execution sequence needs control over changes to scheduler external conditions Error traces might be very rare (in certain complex scenarios) Error conditions may be hard to reproduce ("Heisenbugs") Exhaustive exploration of all execution traces is infeasible quad (exponential in number ofthreads   their size) [Farchi, Nir, Ur: Concurrent bug patterns and how to test them, 2003] x = 0 || x = 0x101 =^- x == 1 possible!! if the bytes are written separately (hi from 0, low from 0x101) even if accesses protected, object may change in between lockO; idx = table find(key); unlockO; if ( ) { lockO; table [idx] = newval; unlockO; } (e g programmer unfamiliar with code) tl: synchronized(ol) {n++;} t2: n++;    notsync or tl: synchronized(ol) {n++;} t2: synchronized(o2) {n++;} : "optimizing" on-demand initialization Foo { Helper helper = ; Helper getHelperO { (helper == ) ( ) { (helper == ) helper = Helper(); helper; Problem: compiler is free to reorder for optimization (but which may happen): sleepO wrongly used to guarantee a delay : when executed before wait: tl: synchronized(o) { o waitO; } || t2: synchronized(o) { o notifyAllO ; } : on resume, must check awaited condition (resume might have happened due to other causes) code written assuming the criticai section won’t block false, if (bad) code provided by someone else "orphan" threads if creator thread terminates with error may lead to deadlock A concurrent language must have a memory model that is and which does , by restricting optimizations Solution [JSR 133; Manson, Pugh, Adve, PLDi’05]: define a class of programs ( ) for which is ensured + minimal guarantees for rest of programs (even if incorrectly synchronized) Principie define a order [Lamport] between program actions: of a) ordering of synchronization actions (b w any unlock and lock on same monitor, and b w write and read on a volatile variable) and b) program order (between execution threads) Reading a volatile variable: last value written in synchronization order Reading a non-volatile variable: any value which is not written later according to happens-before and is not obsoleted by another write Reading a volatile variable: last value written in synchronization order Reading a non-volatile variable: any value which is not written later according to happens-before and is not obsoleted by another write Warning: does NOT mean i Race condition = conflicting accesses (r-w, w-w) not ordered by happens-before Well-synchronized program = does not have race conditions implicitly, JUnit observes thread that launched the test => does not detect exceptions in threads launched later need frameworks with features adapted to concurrency Various jUnit additions, e g ConcJUnit [Rice University] creates observers a group of execution threads warns if other threads still running after main thread completes (should have been handled with a join ) may insert arbitrary delays => generates other interleavings RunnerScheduler (experimental APi addition) idea: create variation in thread scheduling ConTest [iBM Haifa] instruments program (sleepO, yieldO, etc ) or simulates delays, message loss, etc random or guided variation in scheduling measures coverage with respect to all possible schedules interleavings CHESS [Microsoft Research] captures calls to synchronization functions systematically generates executions with new schedules in increasing order of preemption count can reproduce generated executions Many proposed Solutions One widely used algorithm: Eraser combines static and dynamic analysis by analyzing execution finds errors in others keeps track of locks acquired by each thread tries to derive which lock protects which shared object init: C(v) = all-locks;    for each variable v access: C(v) = C(v) П locks-heldtt);    on access by t if (C(v) = 0) warning();    unprotected access! if extended, may distinguish read and write locks, tracking the state of each variable (virgin, exclusive, shared, shared-modified) Conservative algorithm, may give false alarms for correct programs (which do not associate a variable with a unique lock throughout) [Artho, Havelund, Biere 2003] Errors: when granularity of protected variables not same over time void swapO { int ix, ly; synchronized(this) { ix = this x; ly = this y; synchronized(this) { this x = ly; this y = ix; void resetO { synchronized(this) { this x = 0; synchronized(this) { this y = 0; Member access synchronized, but swap and reset may interfere! => Need analysis not just for variables (what locks protect them?) but also starting from locks (what variable sets covered by each?) Completely explores program executions simulates nondeterminism through a custom virtual machine which allows choosing scheduling variants at each step and returning to unexplored ones (similar to backtracking) Works at bytecode level; allows to check deadlocks exceptional conditions assertions in code Limited to smaller programs (10 kloc): "state space explosion" size of stored States (number of program variables) number of possible traces (exponential in number of threads) Lu, Park, Seo, Zhou: Learning from Mistakes - A Comprehensive Study on Real World Concurrency Bug Characteristics, ASPLOS’08 Research Questions: what kinds of real bugs can be detected? are assumptions valid ? e g focus on single-variable access how helpful are tools in diagnosing and fixing ? 105 randomly selected real world concurrency bugs 74 non-deadlock bugs + 31 deadlock bugs 4 large open-source programs: Apache, Mozilla, MySQL, OpenOffice 97% two patterns: atomicity or order violation latter not well addressed by tools 97% two threads, circular wait 96% reproducible w  partial order between 2 threads 92% order between suggests 66% involved only one variable 22% caused by one thread acquiring resource held by itself 73% of non-deadlock bugs fixed by adding locks 61% fix: prevent thread to aquire a lock; may cause other bugs Transactional memory could avoid 39% of bugs + 42% more by addressing some concerns (l O, atomic GC) R Xin et al , An Automation-assisted Empirical Study on Lock Usage for Concurrent Programs, iCSM 2013 4 programs: Aget, Apache httpd, MySQL, Pbzip2, up to 786Kloc issues to study: (language) characteristics of lock usage (function lock counts) lock usage patterns lock usage evolution 80% of the lock related functions acquire only one lock simple lock patterns account for 55% of all lock usage only 12 out of 527 detected patterns are conditional (more error-prone) only 0 65% of functions are lock related Wojkicki &г Strooper, A State-of-Practice Questionnaire on Verification and Validation for Concurrent Programs, PADTAD’06 35 survey respondents, Java development Relevant defects: deadlock, interference (> 80%), starvation (50%) Techniques: code inspection, jUnit test (> 80%) static analyis (50%, mostly FindBugs), code coverage, model checking (20%) Kester, Mwebesa, Bradbury (SCAM 2010): How Good is Static Analysis at Finding Concurrency Bugs? used 12 benchmarks from Java PathFinder and iBM ConTest evaluated 3 tools: FindBugs, JLint, Chord recall: 30-33 % of actual known bugs precision: 100% (Chord), 78% (JLint), 31% (FindBugs) Threat to validity: small-scale evaluation (13 bugs) Sadowski &г Yi How Developers Use Data Race Detection Tools SPLASH PLATEAU’14 Two data race analysis mechanisms: ThreadSafety and TSan ThreadSafety: static, annotation-based, implemented in Clang led to 18 bug-fixing commits (1 month) in small section of code TSan (ThreadSanitizer) : dynamic identification of data races TSan vl - Valgrind, 20-300x slowdown TSan v2 - LLVM, happens-before, 5-15x slowdown, TSan in 30 min found Chrome bug hunted for 6 months Team A: ThreadSafety for does, nightly runs of TSan find 1 race per 10 weeks Team B: added annotations to all core libraries ensures annotation for all mutexes (automatically searched) Team C: stable synch code, no payoff for ThreadSafety, not heard of TSan Team D: ThreadSafety for tricky code, not heard of TSan Reproducibility &г low false positives are important Team culture matters Tradeoff: races vs deadlocks (crash is easy, inconsistency is hard) Manual inspection is implicit comparison point Good does important for building mental models Limitations: slow speed and lack of coverage (TSan), difficulty of annotation (ThreadSafety) Marius Minea marius@cs upt ro 20 November 2017 Any (variable x, array element, structure field) of type has an of type where its value is stored , it is a Valid addresses are non-null indicates an invalid address is ( *)0 i e , 0 cast to type * An address is a numeric value, but not of type or it may be printed with format specifier in printf For low-level systems programming: Types intptr t and uintptr t (from stdint h) are the right size to hold a * We need to know how to 1 a variabile of pointer (address) type 2 a pointer (address) value 3 a pointer (address) value To use pointers correctly, need to (like for all variables values): 1 be aware of their 2 them correctly 3 use the right   functions type *ptrvar; => the variable ptrvar may contain the address of a value of type Examples: *s; * ; When declaring several pointers, need for ofthem: * , *q; two integer pointers * , q; one pointer p and one integer q initialize pointers in declarations wherever possible like with any variable: don’t risk using uninitialized values From (a pointer): , *a = tab; same as: ; * ; a = tab; Declaring tab[N]; array name tab has type Taking same as: of a variable: , *p = &n; ; * ; p = &n; A *s = is a pointer to the contents (to first char): ; same as: *s; s = ; The operator prefix operator gives the object located at address operand: pointer (address); result: (variable) indicated by pointer is an (can be assigned, like a variable) can also be used in an expression, like any value of that type says is the type of and is the type of The operator is the of *&x is the object at the address of x, that is, x &*p is the address of the object at address p, that is, p , у, *p = &x; у = *p; *P = y; x has type => &x has type p has type => *p has type Any variable has an address =^- pointer variables have addresses Any expression has a type: The address of a variable of type has type The address of a variable of type has type etc Having declared => we can declare * ; the type of &p is ** ** and initialize assign it with &p T * p; may be read: Variable Value Address T* p; p has type T * = 5; 5 0x408 7" *p; *p has type T * =&x; 0x408 ОхБІС **s; address of char addr *t ; array of 8 char addr ** =&p; ОхБІС 0x9D0 A with is NOT an 1 The in a declaration is an indirection operator! is written next to the declared variable, but belongs to the Declaration * ; suggests that *p is an but the variable declared is p, NOT *p (*p is not an identifier) so the initializer is for p, NOT for *p = { 3, 5 }; initializes t WRONG: t —  { 3, Б }; , *p = &x; is like ; * ; p = &x; (p is initialized assigned, NOT *p) *p - &x is a type error! ; is ; WRONG: *p = "str"; *p = *p; p = Programs can’t have just pointers These must point to something (useful data: need variables to store it in) Declaring ; means i want to have an integer What for? What value does it have? => Better: = a ; Declaring * *p; only means i want to use the address of a char Need: *p = buf; p points to array buf ; declared before *p = ; p points to a *p = strchr(buf, ); returned by function, could be NULL it’s an to use int sum;—for (i 0;—i++ program behavior is -)—sum +  a[i] ;    initially?? (best case: random initial value) , like any variables with a (of a variable), or an initialized pointer with a address (later) : int +p; *p   0; : char *p; scanf ("7020s", p); p is (best case NULL, if global variable) value will be written to unknown memory address program crash is luckiest case! WARNiNG: a pointer is not an int WRONG: int *p = 640; i Address space is determined by system, not user an arbitrary address we want A function а variable passed as parameter because the is passed, not the variable itself ( ) { ++x; printf( , x); } ( ) { = 5; nochange(a); printf( , a); But, with a variable’s its value: = *p; p, we may it: *p = ; Having a variable’s , a function may to it (e g scanf) ( * , * ) { > tmp = *pa; *pa = *pb; *pb = tmp; = 3, у = 5; swap(&x, &y); We use to pass to return e g min (can’t pass array in C) (return allows only one) max of an array; result error code When passing an array to a function, the in tab[LEN] ; the tab has type ( П) is same as ( * ) Variants of printf scanf with strings as source destination ( *s, *format, ); ( *s, *format, ); sprintf has =^- may overflow buffer Use instead: ( *str, size t , *format, writing is limited to size chars including  0 safe option ; s[] = ; (sscanf(s, , &n) == 1) (but we don’t know where processing of string stopped) ( *s, **endptr, assigns to *endptr the address of first unprocessed char (if not needed, pass 2nd arg NULL) if base is 0, accepts octal decimal hex (as in C, like ° "i in scanf) *end; n = strtol(s, &end, 10); also for , for base 10 = atoi(s); returns 0 on error, but also for use only when string known to be good command line: with (options, files, etc ) Examples: gcc -Wall prog c or is directory or cp filei file2 main can access command line if declared with 2 args ( these): count of in command line (1 + arguments) *argv[] arguments: array of strings, ends with NULL ( , *argv □) { printf( , argv ); (argc == 1) puts( ); ( = 1; i uses memory; (array has fixed address) *a and *pa: indirections with different operations in machine code: *a references object from address (direct addressing) *pa must first get of variable pa, loading it from &pa, then dereference it (indirect addressing) pa = addr a = addr ; * = a; s П = ; s is , s is etc s is a (char *), not a variable in memory CANNOT assign s = but may assign s = ’f ’ (s) is 5 * ( ) &s is s but type is address of 5-char array: (*) (entire array) is not strlen (up to ’ 0’) *p = ; P is a ( CANNOT assign p = ’f' ( can assign p = s; then p (p) is ( *) p is , p is (same *), has a memory location is a string ) = ’f ’; can assign p = : &p is NOT p =^- WRONG: scanf ("° 04з", fcp); RiGHT: scanf ( , p); (if p is valid address and has room A variable v of type uses ( ) bytes => &v + 1 is the address afterv’s space (next object) &v + 1 is value of &v plus ( ) bytes + on a pointer increments by (not one byte) 1 pointer and integer: like address of array element means means 3[a] is a a + i means i past a, NOT i bytcs past a for *a 1 element =1 byte => number also means bytes increment ++a, a++: a becomes a+1 before after evaluation Pointer arithmetic is only valid within the same array object exception: can take address just beyond (at end) of array [LEN], *end = a + LEN; a+LEN+1 is a valid address (beyond legal memory access) C has no overflow checks! Careful with indices! 2 : only for pointers of type (and in same array!) = number of objects of type between the two addresses &a[j] - &a[i] == j - i To get the number of bytes, (cast) pointers to * p - q == (( *)p - ( *)q)   (T) No other arithmetic operations between pointers are defined! May use comparison operators: ==, ! = , can’t dereference, can’t do arithmetic But: * are assignment-compatible with any pointer Useful for writing functions that accept any pointer Cast * to * to do arithmetic: ( * , , ) { ( *p = ( *)a + cnt * size; —p >= a; ) *p = ++ (and —) have higher precedence than * (indirection) *p++ ++ applies to p: take value, (post)increment pointer value is object original pointer value *++p increments pointer, then dereferences value is next object original pointer value (*p)++ (post)increments the value at address p expression has the value increment ++*p (pre)increments value at address p expression has the value increment same meaning: "to indicate" = "to point to" To write a[i], need two variables and one addition (base + offset) and multiplication with size of type (if not char, of size 1) Simpler: directly with pointer to element &a[i] (a+i) increment pointer rather than index when traversing array *strchr i( *s, ) { ( = 0; s[i]; ++i) (s[i] == c) s + i; NULL; *strchr p( *s, ) { ( ;*s; ++s) (*s == c) NULL; s; *strcat i( { = O, j; (dest[i]) ++i; (j =0; src[j]; ++j) dest[i+j] = src[j]; dest[i+j] = ; dest; *strcat p( *dest, *d = dest; (*d) ++d; (*d++ = *src++); dest; *dest, *src) *src) A bidimensional array (matrix) is declared as type a[DiMl] [DiM2]; for instance [DiM1] [DiM2] ; a[i] is constant address ( *) of an array of DiM2 elements (line of the matrix) a[i] [j] is jth element in array a[i] of DiM2 elements &a[i] [j] or a[i]+j is DiM2*i+j elements after address a function with array parameter needs all dimensions except first => must declare as sometype f ( П [DiM2]); a[i] which is *(a+i) means i lines (xDiM2 elements) after a => a has type (*) [ ] (pointer to array of DiM2 ints) t ={ t is matrix (2-D char array) *p ={ p is array of pointers j a n  o f e b  o d e c  o t uses 12 * 4 bytes (+ 12*4 bytes for the string constanta) t = is WRONG p = changes an t is constant address of line 7 (element 7 from pointer array p) can do strcpy(t , ) or strncpy Declare in loop header whenever possible (since C99) enforces scope, visually clear, avoids affecting other loops Use whatever results in simpler, understandable code ( , , [m] [n] ( ( [n][p], Cm] Cp]) { = 0; i ( (* )( )) { ( =0; i * , size t size t , (* )( *, ( * , size t , size t , (* )( *, address of array to sort, element count and size address of comparison function, returns int 0) has void * arguments, compatible with pointers of any type (* ) ( *, *); ( * , * ) { *pl - *p2; } = { -6, 3, 2, -4, 0 qsort(tab, 5, ( ), ( )intcmp); Can also declare function with *, do cast in function ( * , * ) -[ *( *) - *( *) ; } qsort(tab, 5, ( ), ); When the language us to: (memory blocks) cannot be passed   returned from functions only their (array name is its address) addresses carry information => must pass size parameter : a string (constant or not) is a char * need not pass size, since null-terminated : a function name is its address When a function needs to modify variable passed from outside pass of variable Any address passed to a function needs to be valid (point to allocated memory) functions their arguments pointers must be valid Static analysis Static analysis Formal verification Lecture 9 Static analysis December 15, 2005 Marius Minea - Dataflow analysis mainly techniques originating in compiler construction emphasizes tradeoff betweeen precision and efficiency - Constraint-based analysis general framework for solving analysis problems by representing them as constraint relations between sets: generic and efficient algorithms - Abstract interpretat ion simplifies program by defining a semantics that considers only those aspects relevant for the desired property - Type systems by defining an appropriate type systems, many properties can be con-verted to type checking or type inference problems Formal verification Lecture 9 Marius Minea Techniques originating in the compiler domain - used for code generation (e g register allocation) - and code optimization (constant propagation, lifting common subex-pressions, detecting unused variables, etc ) Techniques have evolved and have been unified into a general framework applicable also to other code analysis problems Basic approach: - construct the program control flow graph (CFG) - observe how properties of interest change during program execution (upon traversing the nodes   edges of the CFG) Formal verification Lecture 9 Marius Minea Static analysis A re prese ntat ion in which: - nodes are statements - edges indicate sequencing of statements => we can have nodes with: - a single successor (straight-line code, e g assignments) - several successors (branch statements) - several predecessors (join after branching) Alternative representation: - nodes are program points - edges are statements together with their effects Formal verification Lecture 9 Marius Minea G = (N,E) : control flow graph (N : nodes; E : edges) s : one program statement (node in the control flow graph) entry, exit : program entry and exit points  n(s) : set of edges that have s as destination out(s) : set of edges that have s as source src(e), o'est(e) : source and destination of edge e pred(s) : set of predecessors of statement s succ(s) : set of successors of statement s With these notions, we write that describe how the analyzed values (dataflow facts) change from one statement to to the next we use subscripts jn and out for the value analyzed at entry and exit from statemens s Formal verification Lecture 9 Marius Minea What are all definitions (assignments) that can reach the current program point? (before their assigned values are overwritten) Elements of interest are pairs: (variable, source line of definit ion) For each statement (identified by its labei Z) we are interested in the value before and after its execution: RDin(s) and RD0Ut(s) - the initial node in the graph is not reached by any definition RDout(entry) = {("’ ?) i"’ e V} - an assignment   : e erases all previous definitions for variable v (but not for other variables) and introduces the current line (definition) RDOUtC1 : e) = (RDlnW   {("•"')}) u {("’ 0} - definitions on entry of a statement are the union of definitions at exist of the predecessor instructions: RDjn(s) = Us epred(s) RD0Ut^ Formal verification Lecture 9 Marius Minea Static analysis Static analysis Static analysis At each program point, what are the variables whose value will be used on at least one of the possible program paths from this point ? (useful in compilers for register allocation) Transfer function: TV n(s) = (TV0Ut(s)   w 7te(s)) и read(s) (a variable is live before s if it is read by s, or it is live after s without being written by s) => direction of analysis is backward Operation for combining values joining paths (meet): ТУ"в evaluation can be lifted to current point, before any branches - a backward analysis, universaly quantified (must) VBEjn(s') = (VBEout(s')   {e | V(e) П write(s) 0|) и Subexp(s) 0 if SUCC(s) = 0 outl l ruvsucc(") VBEin(s’') otherwise => combination done by union (may, on at least one path) => combination done by intersection (must, on aii paths); analysis is before Formal verification Lecture 9 Marius Minea Formal verification Lecture 9 Marius Minea Formal verification Lecture 9 Marius Minea Static analysis Static analysis Concretely: We might wish to analyze several properties, such as: -value of a variable at a program point - or the interval of values for a variable - of sets of variables (live), expressions (available, very busy), possible definitions for a variable (reaching definitions), etc Abstractly: a set D of values for a property (dataflow facts) Restriction: D is a set - we have associated with program points sets of values for the ana-lyzed property - we have iteratively recomputed the corresponding sets, by union or intersection operations, enlarging or restricting the set of values What are the essential properties that allow this kind of calculation ? : O (L, □) is a set equipped with a CC L x L, i,e , a relation which is: - reflexive, x Qx for any x e L - transitive, xzyrycz=>xcz for any x,y,z e L - antisymmetric: xzyrycx=>x=y for any x,yr L Example: powerset (P(D),Q) or (P(D),D) (complete) lattice = a partially ordered set in which any finite subset has a least upper bound and a greatest lower bound iq is an upper bound of У с T if VZ e У we have l c l0 lQ is a lower bound of У c L if УІ e У we have l0 с l Denote: ЦУ: the least upper bound of the set УСТ ГІУ: the greatest lower bound УСТ si± = U0 = rU т=ГІ0 = иь We define the operations meet : x п у = П{і, у} Joln : xU у = UfV y} (for powerset: intersection, union) Formal verification Lecture 9 Marius Minea Formal verification Lecture 9 Marius Minea Formal verification Lecture 9 Marius Minea Static analysis Static analysis Static analysis The operations n (meet) and u (join) are: - commutative - associative - x п 1 = 1 and x u T = T, for any x A distributive lattice: one in which the operators n and u are mutually distributive: x П (y U г) = (т П y) U (т П г) x U (у П г) = (т U у) П (t U г) Formal verification Lecture 9 Marius Minea : statements determine changes in the program state The value of a variable after a statement is a function of its value at the beginning of the statement : Each statement s has associated a transfer function F(s) : L L that determines the way in which the value of the property at the beginning of the statment is modified by the statement: Frop0Uf(s) = F(s)(FT'op -n(s)) (forward analyses), or conversely (backward analyses) Restriction: we require transfer functions to be x E у => f (x) □ f (?  ) (if we know more about the argument, we should know more about the result) Particular case: bitvector frameworks' the lattice is a powerset P(D), transfer functions are monotone and of the form: Т(з)(г>) = (v   kill(s')') U gen(s) (v = dataflow fact, gen killts') = information generated deleted in s) Formal verification Lecture 9 Marius Minea Example for forward analyses: ProP0UtW = РЫ(РгоріпѴУ) Propjn{s) = ГТргес (а) Prop0ut(S') where we denote by П the effect of combining information (meet) on severai paths (could be n or u) initially, we know the value Prapout(entry') For backwards analyses, the roles of in and out change, and the value of Propjn(exit) is known Formal verification Lecture 9 Marius Minea Static analysis 16 Static analysis Static analysis To corn pute the sol ut ion for the above equation system, we use an iterative algorithm which propagates changes in the direction of the analysis foreach s e N do Propus) = T  * no info *  Propjn(entry) = init    depending of the analysis W = {entry} while W 0 choose s e E W = W   {s} Propin(s') = ІТргес (а) Prop0ut(s’') ProPoutW = РЫ(ргоріпѴУ) if change then forall s' e succ(s) do W = W и {"'} Formal verification Lecture 9 Marius Minea Termination of the analysis is guaranteed if the transfer function is monotone: x с у =>  (x) □ f(y), which implies that computed properties change in a monotone way Def: for a function f: a value x for which  (x) = x Tarski's Theorem guarantees that a monotone function over a lattice has a minimal and a maximal fixpoint The worklist algorithm computes the minimal fixpoint for the given system of transfer functions Formal verification Lecture 9 Marius Minea We wish to corn pute the combined effect of program statements: for the sequence of instructions p = sn we define F(p) = F(sn) o o T(s2) o F(S1) and we wish to compute: Pptent’V) But the worklist algorithm combines the effects at each meet before computing further Since functions are monotone, we have: f (x и y) □ Дх) и №) thus the analysis loses precision For distributive transfer functions we have equality:  (x) и f(y) = fV U у) it can be shown that the iterative worklist algorithm (the fixpoint so-lution) is equivalent with computing the solution by combining values over aii possible paths (meet over aii paths) => combining the individual execution paths does not lose information The examples given so far (live variables, etc ) are distributive Formal verification Lecture 9 Marius Minea Static analysis 19 - forward or backward - must or may - control flow sensitive or control flow insensitive: do we need to consider the order of statements in the program ? - no: what variables are used changed, what functions are ca lied, etc - yes: properties effectively depending on values computed by the program - context dependent or context independent for programs with procedures: is the analysis of each procedure specialized depending on its caii point, or is a single analysis (procedure summary) employed ? Formal verification Lecture 9 Marius Minea 22 November 2017 Testing Object-Oriented Software 2 Each creates a for inherited features: => correciness of superclass does not guarantee that of subclass Q: Do superclass methods work correctly within context of subclass ? For в inheriting method m from a, we should know: 1 can we completely skip re-testing в m ? 2 are the test cases for A m enough ? 3 or do we need new test cases ? which ? Testing Object-Oriented Software 3 subclass can be used anywhere instead of superclass pre(m, Class) => pre(m, SubClass) post(m, SubClass) => post(m, Class) inv(SubClass) => inv(Class) But: we must invariants to check them At the minimum, we analyze Testing Object-Oriented Software Rectangle { height; getArea() { Sguare setHeight( setWidth( 4 width; value) { height = value; } value) { width = value; } height * width; } Rectangle { value) { setHeight(value); setWidth(value); } value) { setWidth(value); setHeight(value); } Testing Object-Oriented Software 5 interactions between and are complex Are there undesired interactions between methods ? and dynamic binding increase number of execution paths make static analysis more difficult void foo(A obj) { obj m(); } could caii method m for any subclass of a Encapsulation limits state when testing increases potential for misunderstanding and error Control of more likely due to many small components is difficult: throughout program Testing Object-Oriented Software 6 [McGregor&Sykes] Due to fundamental language constructs information hiding harderto observe state in testing have persistent state => inconsistency can cause errors later have a lifetime => errors when constructed destructed at wrong time important for testing object interactions may be called in improper object state have parameters (used updated): are those in the right state? do they correctly implement their interfaces? (subtyping errors) Testing Object-Oriented Software 7 = behavioral specification Preconditions for correct behavior may be handled in two ways: contract-based: assumed   defensive programming: checked => influences complexity of implementation and testing simplifies complicates class integration testing Note: defensive programming should also check results! (although in practice, often receiver is considered trustworthy, only caller not) specification: method pre postconditions, class invariants tested! Specification must also be validated i implementation: error opportunities through Constructors destructors (incorrect initialization deallocation) inter-class collaboration: members or object parameters may have errors Does a client have the means to check preconditions? (hidden state?) Testing Object-Oriented Software 8 May propagate errors to descendants => stop through timely testing Typical OO code style: short methods, little processing, many calls => code decision coverage loses relevance Offers a mechanism for test reuse, from super- to subclass Testing may detect inheritance just for code reuse without inheriting specification Testing must check observing the substitution principie From the perspective of in program testing: Subclass keeps all observable States and transitions among them May add transitions (supplementary behavior) May add observable States (sub-states of initial ones) : difficulty of understanding testing sequence of calls => likely error: caii wrong method implementation from hierarchy in class hierachy reflected in tests (general —> specific) Testing Object-Oriented Software 9 [Weyuker ’86,’88], reformulated for OO by [Perry & Kaiser ’90] Different implementations to same functionality need different tests 1) A redefined method needs other more tests (depending on code) 2) The same method when inherited needs different class-based tests e g : A: +m(), +n() B: +m () C: +n() m calls n () => c: : m inherits в: : m but calls another n () => different tests! A test set adequate for a program need not be adequate for one of its components (it could be exercised in a different context to that program) => Adequate testing for a client is insufficient for a library (client could use only part of the functionality) => if deriving from a tested class, must still test inherited methods (code added may interact with the state => with inherited methods) Testing Object-Oriented Software 10 A test set adequate for components need not be adequate for their combination brief argument for sequential combination: p program paths in P and q paths Q^p-q>p- -q paths F; Q even more when execution alternates between P and Q => Unit module testing cannot replace integration testing! => A method tested in the base class is not tested sufficiently in the derived class (it may be composed in different ways) Programs with the same control flow but different operations values need different test suites Testing Object-Oriented Software Set class with methods: add (element)    precondition: element not in set    raise Duplicate exception otherwise remove(element) Testing: two consecutive add(x) raise exception but element might still be added a second time error discovered only with 2 x add, 2 x remove harder to test than with directly observable object state Testing Object-Oriented Software 12 Problem: implementing a class requires understanding details and representation conditions of all base classes to be sure of correct implementation Two main classes of problems: 1) initialization forgetting correct initialization of superclass 2) forgetting redefinition of method accounting for class specifics COpy methods ОГ isEqual Testing Object-Oriented Software 13 Q: what are relevant object method combinations to consider ? : all callable method implementations : all possible receiver classes Example [Rountev, Milanova, Ryder 2004] class A { public void m() { } } class В extends A { public void m() { } } class C extends A { } A a; • • • a m () ; target-methods: test calls to la A m (), в m () receiver-classes (more comprehensive): test a of type А, В, C Testing Object-Oriented Software 14 Deriv used inconsistently also as Base e g : Stack (access at one end) derived from Vector (indexed access) using Vector: : removeAt (idx) on Stack violates class invariant Cause: design error Detection: test class invariants Testing Object-Oriented Software 15 : intra- and inter-method, intra- and inter-class (caused by encapsulation): explicit flattening of class hierarchy better: allowing data access by testing framework or: use getter methods to access state : tests need to instantiate all possible subtypes for an object declared as a base type static analysis to find all possibilities (class hierarchy analysis) testing Data and changed state are important; line branch coverage gives little info on small method bodies : defined by def-use pairs b w methods i e a member defined(written) in m1() and used(read) by m2() used to select methods that are tested together Testing Object-Oriented Software 16 Distinguish: tests starting from or (code) S: new tests for old methods, when specification changes S: new postconditions invariants for old tests in derived classes i: new tests for new methods, depending on desired coverage Examples: Change a method m (): retest methods that interact: methods calling m and that have with m Change m () in superclass: re-test m () + interacting methods; re-test m () in context of subclass(es) Overwrite m (): augment tests of Base: :m for adequate coverage Overwrite m () used by Base : : n: test n in subclass Change of interface (abstract class): re-test whole hierarchy Testing Object-Oriented Software 17 At level Category Partition (l O analysis, partitioning equivalence) Combinational Function Test (condition coverage) Recursive Function Test Polymorphic Message Test (client of a polymorphic server) At level invariant Boundaries (valid invalid values for class invariant) Nonmodal Class Test (class w o sequencing constraints) Modal Class Test (class with sequencing constraints) Quasi-Modal Class Test (constraints dependent on state) For Abstract Class Test (interface) Generic Class Test (parameterized) New Framework Test Popular Framework Test (changes in an APi) Testing Object-Oriented Software 18 For a virtual method caii (in a client), test all possible classes to which the caii could be made Need to deal with   potential errors: - incorrect preconditions on caii for some subclasses - caii to unintended class (reference to unintended type) - change of class hierarchy (affects code tests) Dynamic binding is similar to (multi-way) branch in code => covering all instances   branch coverage Testing Object-Oriented Software 19 Nonmodal class: accepts any method caii in any state e g DateTime accepts any sequence of get set (use def) Types of test behavior - define-operation: set to valid input   check answer - define-exception: set to invalid input   check answer - define-exception-corruption: state not corrupt after exception - use-exception-test: normal return after use - use-correct-return: return with correct value after use - use-corruption: object not corrupt after use Testing Object-Oriented Software 20 class with fixed constraints on operation order create a model with object state and transitions between them Problems: - missing transition: an operation is rejected in a valid state - incorrect action   response for a method in a given state - invalid resulting state: method causes transition to wrong state - corrupt resulting state - message accepted when it should be rejected method order constraints change depending on state e g container   collection classes (full empty), etc Typically, we’d like N+ coverage (any method in any state) Testing Object-Oriented Software 21 approach - write class, write tests, run (no other details intermediate steps) - good for simple classes in stable contexts approach - run object from creation to destruction through all methods - constructors - accesors (get) - predicates - modifiers (set) - iterators - destructors December Formal verification Lecture 9 2005 Marius Minea Static analysis 2 - Dataflow analysis mainly techniques originating in compiler construction emphasizes tradeoff betweeen precision and efficiency - Constraint-based analysis general framework for solving analysis problems by representing them as constraint relations between sets: generic and efficient algorithms - Abstract interpretation simplifies program by defining a semantics that considers only those aspects relevant for the desired property - Type systems by defining an appropriate type systems, many properties can be con-verted to type checking or type inference problems Formal verification Lecture 9 Marius Minea Static analysis 3 Techniques originating in the compiler domain - used for code generation (e g register allocation) - and code optimization (constant propagation, lifting common subex-pressions, detecting unused variables, etc ) Techniques have evolved and have been unified into a general frame-work applicable also to other code analysis problems Basic approach: - construct the program control flow graph (CFG) - observe how properties of interest change during program execution (upon traversing the nodes   edges of the CFG) Formal verification Lecture 9 Marius Minea Static analysis 4 A representation in which: - nodes are statements - edges indicate sequencing of statements => we can have nodes with: - a single successor (straight-line code, e g assignments) - several successors (branch statements) - several predecessors (join after branching) Alternative representation: - nodes are program points - edges are statements together with their effects Formal verification Lecture 9 Marius Minea Static analysis 5 G = (N,E) : control flow graph (7V : nodes; E : edges) s : one program statement (node in the control flow graph) entry, exit : program entry and exit points   7(s) : set of edges that have s as destination out(s) : set of edges that have s as source src(e), c est(e) : source and destination of edge e precis) : set of predecessors of statement s succes) : set of successors of statement s With these notions, we write that describe how the analyzed values (dataflow facts) change from one statement to to the next we use subscripts jn and out for the value analyzed at entry and exit from statemens s Formal verification Lecture 9 Marius Minea Static analysis 6 What are all definitions (assignments) that can reach the current program point? (before their assigned values are overwritten) Elements of interest are pairs: (variable, source line of definition) For each statement (identified by its labei Z) we are interested in the value before and after its execution: RDj^s) and RDou^s) - the initial node in the graph is not reached by any definition RDout(entry) = {О,?) i v e V} - an assignment l : v e erases all previous definitions for variable v (but not for other variables) and introduces the current line (definition) RDout(l :v^e) = (RDin^   { (i;,  )}) U { (i;, Z)} - definitions on entry of a statement are the union of definitions at exist of the predecessor instructions: RDin(s) = Us Eprec (s) Formal verification Lecture 9 Marius Minea Static analysis 7 At each program point, what are the variables whose value will be used on at least one of the possible program paths from this point ? (useful in compilers for register allocation) Transfer function: LVjn{s) = {LVou^s)   write{sY) U readus) (a variable is live before s if it is read by s, or it is live after s without being written by s) => direction of analysis is backward Operation for combining values joining paths (meet): LV eout^s) 0 Us'est CC(s) bV efn(s') if succ(jL) = 0 otherwise => combination done by union {may, on at least one path) Formal verification Lecture 9 Marius Minea Static analysis 8 At each program point, what are the expressions whose values has been previously computed, without it having changed, on all paths to this point? (if value is stored in a register, it need not be recomputed) Transfer function: AEou^s) = (AE^s)   {e | V(e) П wr te(s) 7^ 0}) U{e E Subexp{s) | V(e) П write(s) = 0} (expressions on entry to s which have no variables modified by s, and any expressions computed at s without changes in their variables) Combination operation (meet): 0 XE^(,s) — if precis) = 0 otherwise => combination done by intersection {must, on all paths); analysis is before Formal verification Lecture 9 Marius Minea Static analysis 9 What are the expressions which must be evaluated on any path from the current program point before the value of an appearing variable is modified ? => evaluation can be lifted to current point, before any branches - a backward analysis, universaly quantified {must) VBEin{s) = (VBEout(s)   {e | V(e) П write(s) ф 0}) U Subexp{s) VBEout(s) i ns,ESucc(s)VBEin(s') if succ(s) = 0 otherwise Formal verification Lecture 9 Marius Minea Static analysis 10 Concretely: We might wish to analyze several properties, such as: - value of a variable at a program point - or the interval of values for a variable - of sets of variables (live), expressions (available, very busy), possible definitions for a variable (reaching definitions), etc Abstractly: a set D of values for a property (dataflow facts) Restriction: D is a set Formal verification Lecture 9 Marius Minea Static analysis 11 - we have associated with program points sets of values for the analyzed property - we have iteratively recomputed the corresponding sets, by union or intersection operations, enlarging or restricting the set of values What are the essential properties that allow this kind of calculation ? (L, □) is a set equipped with a □C L x L, i,e , a relation which is: — reflexive, x □ x for any x e L - transitive, x ^y  y ^z^x ^z, for any x,y,z e L - antisymmetric: x ^y  y ^x^x = y, for any ж, у e L Example: powerset (P(D),C) or (P(Z?),D) Formal verification Lecture 9 Marius Minea Static analysis 12 (complete) lattice = a partially ordered set in which any finite subset has a least upper bound and a greatest lower bound lo is an upper bound of Y C L if V  e Y we have l C iq lo is a lower bound of Y C L if V  e Y we have iq С l Denote: |JY: the least upper bound of the set Гн: the greatest lower bound Y C L si± = U0 = riL Т=П0 = иГ We define the operations meet : x п у = П{ж, у} Join : x u у = | 1{ж, у} (for powerset: intersection, union) Formal verification Lecture 9 Marius Minea Static analysis 13 The operations n (meet) and и (join) зге: — commutative - associative - x П ± = ± and x U T = T, for any x A distributive lattice: one in which the operators n and и are mutually distributive: x П (y U г) = (ж П у) U (ж П г) х U (у П г) = (ж U у) П (ж U г) Formal verification Lecture 9 Marius Minea Static analysis 14 : statements determine changes in the program state The value of a variable after a statement is a function of its value at the beginning of the statement : Each statement s has associated a transfer function F(s) : L L that determines the way in which the value of the property at the beginning of the statment is modified by the statement: Propout(s) — (forward analyses), or conversely (backward analyses) Restriction: we require transfer functions to be x E у =>  (ж) E  (у) (if we know more about the argument, we should know more about the result) Particular case: bitvector frameworks: the lattice is a powerset P(Z?), transfer functions are monotone and of the form: F(s)(t>) = (v   kill(s)) U gen(s) (v = dataflow fact, gen killts) = information generated deleted in s) Formal verification Lecture 9 Marius Minea Static analysis 15 Example for forward analyses: Pr°Pout(s) = i?(s)(-Pr°P n(s)) Propin(s) = ns,eprec (s) Propout(s') where we denote by П the effect of combining information (meet) on several paths (could be n or u) initially, we know the value Propout(entry) For backwards analyses, the roles of in and out change, and the value of Propin(exity) is known Formal verification Lecture 9 Marius Minea Static analysis 16 То compute the solution for the above equation system, we use an iterative algorithm which propagates changes in the direction of the analysis foreach stA'do Propus) = T  * no info *  Propjn(entry') = init    depending of the analysis W = {entry} while W 7^ 0 choose s e W W = VK {s} Prop n(s) = rUred(s) FropO(jt(s') Pr°Pout(s) = T(s)(Frop n(s)) if change then forall s' g succ(s) do W = W и {s'} Formal verification Lecture 9 Marius Minea Static analysis 17 Termination of the analysis is guaranteed if the transfer function is monotone: x □ у =>  ( t) □ fty), which implies that computed properties change in a monotone way Def: for a function f: a value x for which  (ж) = x Tarski’s Theorem guarantees that a monotone function over a lattice has a minimal and a maximal fixpoint The worklist algorithm computes the minimal fixpoint for the given system of transfer functions Formal verification Lecture 9 Marius Minea Static analysis 18 We wish to compute the combined effect of program statements: for the sequence of instructions p = • • • sn we define F(p) = F(sn) o o F(s2) o F(si) and we wish to compute: ^pGPath(Prog) Fp(entry) But the worklist algorithm combines the effects at each meet before computing further Since functions are monotone, we have:  (xUy) □ f (ж) U f(y) thus the analysis loses precision For distributive transfer functions we have equality:  (ж) и  (у) =  (ж U у) it сап be shown that the iterative worklist algorithm (the fixpoint so-lution) is equivalent with computing the solution by combining values over all possible paths {meet over all paths) => oombining the individual execution paths does not lose information The examples given so far (live variables, etc ) are distributive Formal verification Lecture 9 Marius Minea Static analysis 19 - forward or backward - must or may - control flow sensitive or control flow insensitive: do we need to consider the order of statements in the program ? - no: what variables are used changed, what functions are called, etc - yes: properties effectively depending on values computed by the program - context dependent or context independent for programs with procedures: is the analysis of each procedure specialized depending on its caii point, or is a single analysis (procedure summary) employed ? Formal verification Lecture 9 Marius Minea Marius Minea marius@cs upt ro 21 November 2016 *p is NOT a pointer! unless p is **, p is the pointer *p is the  value at address p Programs work with Pointers are , they only to data Don’t declare a pointer unless you have what it should point to except: dynamic allocation (provides pointer data space) **, etc *p = &s [i]; *p = ; *p = argv ; Declare and pass ; (scanf( *end; , y; swap(&x, if array s ; declared before data is constant string data put there by runtime system for function to fiii in data: , &n) == 1) = strtod(s, &end); &y); The is a declaring an array allocates a memory block for its elements the array’s is the of that block (of first element) &a is same as a and a is same as *a Can declare sometyp a[LEN] , *pa; and assign pa = a; Similar: a and pa have same type: sometyp* But: pa is a uses memory; pa = addr a is a (array has fixed address) a = addr *a and *pa: indirections with different operations in machine code: *a references object from address (direct addressing) *pa must first get of variable pa (an address), loading it from the constant address &pa) then dereference it (indirect addressing) in function declarations, these are the same (first becomes second): size t ( s []); becomes size t ( *s); As array declarations they are : char s [] = "test"; s is ’t’, s is ’ 0’ etc s is a (char *), not a variable in memory CANNOT assign s = but may assign s = ’f ’ (s) is 5 * ( ) &s is s (but different type) but with different type, address of 5-char array: (*) (entire array) is not strlen (up to ’ 0’) : *p = ; P is a ( CANNOT assign p = ’f' ( can do p = s; then p = (p) is ( *) pC0] is ’t’, p is ’ 0’ (same) *), has a memory location is a string ) ’f ’; can assign p = ; &p is NOT p =^- WRONG: scanf ("° 04s",—&p); RiGHT: scanf( , p); (if p is valid address and has room) A variable v of type takes up ( ) bytes => &v + 1 is the address after the space allocated to v &v + 1 is value of &v plus ( ) bytes + on a pointer increments by an (not a byte) 1 pointer and integer: like address of array element means and means 3[a] is a a + i means i past a, NOT i bytes past a for *a 1 element =1 byte => number added means bytes increment ++a, a++: a becomes a + 1 before after evaluation 2 : only for pointers of type (and in same array!) = number of objects of type that fit between the two addresses To get the number of bytes, (cast) pointers to * p - q == (( *)p - ( *)q)   (T) No other arithmetic operations between pointers are defined! May use comparison operators: ==, ! = , = a; ) *p = ++ (and —) have higher precedence than * (indirection) *p++ ++ applies to p: take value, (post)increment pointer (*p)++ (post)increments the value at address p *++p takes value after incrementing pointer ++*p increments value at pointer (expression has that value) same meaning: "to indicate" = "to point to" To write a[i], need two variables and one addition (base + offset) and multiplication with size of type (if not char, of size 1) Simpler: directly with pointer to element &a[i] (a+i) increment pointer rather than index when traversing array *strchr i( *s, ) { ( = 0; s[i]; ++i) (s[i] == c) s + i; NULL; *strchr p( *s, ) { ( ;*s; ++s) (*s == c) NULL; s; *strcat i( { = O, j; (dest[i]) ++i; (j =0; src[j]; ++j) dest[i+j] = src[j]; dest[i+j] = ; dest; *strcat p( *dest, *d = dest; (*d) ++d; (*d++ = *src++); dest; *dest, *src) *src) A bidimensional array (matrix) is declared as type a[DiMl] [DiM2]; a[i] is address (const type *) of an array (line) of DiM2 elements a[i] [j] is jth element in array a[i] of DiM2 elements &a[i] [j] or a[i]+j is DiM2*i+j elements after address a a function with array parameter needs all dimensions except first => must declare as funtype f (eltype t [] [DiM2]); t ={ t is matrix (2-D char array) j a n  o f e b  o d e c  o t uses 12 * 4 bytes *p ={ p is array of pointers 0x460 —> j a n  o 0x5C4 —> f e b  o 0x9FC d e c 33 p uses 12* ( *) bytes (+ 12*4 bytes for the string constante) p = changes an t = is WRONG t is constant address of line 7 (element 7 from pointer array p) can do strcpy(t , ) or strncpy Declare index in loop header whenever possible (since C99) enforces scope, visually clear, avoids affecting other loops Do use indices if more suggestive, though combinations are possible ( , , [m] [n] [n][p], Cm] Cp]) { ( = 0; i a * (don’t know what it points to) but can assign to from pointer of any other type any pointer OK as arg result for function declared with void is a unary , written as (type-na те) expression the value of expression is converted to the type type-name convert int to real ( )  cnt dereference a * *( *)p is a keyword used to define a for type Syntax: declaration the identifier becomes a type uintl6 t ul6; line ; line text ; A function is its (a pointer) - like for arrays We can pointers of function type Compare: ( ); declares a returning int (* )( ); declares returning int declare : restype (typel, , typel T); declare : restype (typel, , typel T); Can assign pfct = fct with the name of an existing function Need parantheses for Opointer), otherwise: * ( ); is a function returning Function name is pointer can caii function using pointer ( (* )( )) { ( =0; i * , size t size t , (* )( *, ( * , size t , size t , (* )( *, address of array to sort, element count and size address of comparison function, returns int 0) has void * arguments, compatible with pointers of any type (* ) ( *, *); ( * , * ) { *pl - *p2; } = { -6, 3, 2, -4, 0 qsort(tab, 5, ( ), ( )intcmp); Can also declare function with *, do cast in function ( * , * ) -[ *( *) - *( *) ; } qsort(tab, 5, ( ), ); When the language us to: (memory blocks) cannot be passed   returned from functions only their (array name is its address) addresses carry information => must pass size parameter : a string (constant or not) is a char * need not pass size, since null-terminated : a function name is its address When a function needs to modify variable passed from outside pass of variable Any address passed to a function needs to be valid (point to allocated memory) functions their arguments pointers must be valid Marius Minea 21 November 2016 specify the program should do, now in particular, (exposes internai implementation details) or (expose observe computation flow) Main exponents: still directly expresses formulas by which computations are done problem domain expressed as logic rules implications properties of Solutions expressed as constraints over a given theory developed ca 1970 by Alain Colmerauer et al in Marseille A (pure) Prolog program is a list of a Head Body where Body is a conjunction Predicate , , Predicate a Predicate equivalent to Predicate true means implication  3 where a, 3 are formulas Vva with v variable, a formula: Other usual connectors: a A  3 d= -t -  3) (AND) а V 3 =  3 (OR) : 3xy Compared to propositional logic: instead of propositions, predicates over terms desc(X, Y) child(X, Y) desc(X, Z) child(X, Y), desc(Y, Z) child(anna, jon) child(jon, peter) child(eve, jon) child(peter, mary) Variables in clause head are Rest of variables in clause body are quantified quantified ѴХѴУ child(X, V) VXVZ Sr(ch 7d(X, У) A desc(Y Z)) desc(X, Z) Resolution is an that produces a new clause from two clauses with complementary literals (p and -ip) p V a -ip V (3 а V (3 The new clause = of the two clauses w r t p Example: rezp{p V q V - p V  3) We use resolution to show that a formula is a resolution is a method for proof by refutation We have two formuas where a predicate may appear positive and negated: Vx Vy P(x,g(y)) and Vz ->P(z, a) or Vx Vy P(x, g(y)) and Vz -iP(a, z) Are these contradictory ? We may a universally quantified variable with term => in the second case, we may substitute x i-> a, z i-> g(y) => we obtain P(a, g(y)) and -P(a g(y)), in the first case, we may not substitute у and obtain a from g(y) interpretation: we may not assume that the arbitrary function g must also take the constant value a This is precisely defined by and A is а that associates to {xi ti, ,xn t"} For example, f(x, g(y, z), a, t){x g(y), у f(b), t u} = f(g(y),g(f(b),z),a, u) Obs: other encountered notations: Xj tj, or t, x,- Usually postfix notation Ta is used for substitutions a applied to term T The composition of two substitutions is a substitution Two terms ti and may be if there is a substitution a that makes them equal: ticr = t2tr Such a substitution is called Example: f(x, g(y)){x a} = f(a, g(y)) = f(a, z){z g(y)} i e , the substitution {x i-> a, z i-> g(y)} is a More generally: applied to a pairs of terms The is that from which any other unifier may be obtained by using another substitution in : having the clauses P( i,  2, • • •  n) and -iP(ri, Г2, • • • rn) if we find a unifier for ( 1, ri), we have a A variable х may be unified with any term t if x in t not: x with f(g(y)  ?(x z)) (substitution would lead to an infinite term) Two functional terms may be unified only if they have identical functions, and the term arguments may be pairwise unified in particular: only identical constants may be unified Prolog execution can be seen in two ways: Match goal with head of rule or fact, until no more subgoal Apply resolution with negation of goal, until empty clause Consider as goal: desc(X, peter) A = a value for X that makes the predicate true A formula is if its is a We derive a contradiction using Consider as goal: desc(X, peter) A = a value for X that makes the predicate true A formula is if its is a We derive a contradiction using Write the negated goal: -i desc(X, peter) i e , desc(X, peter) is for any X Consider as goal: desc(X, peter) A = a value for X that makes the predicate true A formula is if its is a We derive a contradiction using Write the negated goal: -i desc(X, peter) i e , desc(X, peter) is for any X Choose the first rule for unification (use fresh variables): desc(Xl, Yl) Vchild(Xl, Yl) We get as resolvent -i child(X, peter) X1=X, Yl=peter Consider as goal: desc(X, peter) A = a value for X that makes the predicate true A formula is if its is a We derive a contradiction using Write the negated goal: -i desc(X, peter) i e , desc(X, peter) is for any X Choose the first rule for unification (use fresh variables): desc(Xl, Yl) Vchild(Xl, Yl) We get as resolvent -i child(X, peter) X1=X, Yl=peter Choose for unification the fact child(jon, peter) (nr 3) We get as resolvent the empty clause (contradiction) X=jon Consider as goal: desc(X, peter) A = a value for X that makes the predicate true A formula is if its is a We derive a contradiction using Write the negated goal: -i desc(X, peter) i e , desc(X, peter) is for any X Choose the first rule for unification (use fresh variables): desc(Xl, Yl) Vchild(Xl, Yl) We get as resolvent -i child(X, peter) X1=X, Yl=peter Choose for unification the fact child(jon, peter) (nr 3) We get as resolvent the empty clause (contradiction) X=jon Thusdesc(X, peter) for any X desc(jon, peter) is X=jon is a solution Consider as goal: desc(X, peter) A = a value for X that makes the predicate true A formula is if its is a We derive a contradiction using Write the negated goal: -i desc(X, peter) i e , desc(X, peter) is for any X Choose the first rule for unification (use fresh variables): desc(Xl, Yl) V- child(Xl, Yl) We get as resolvent -i child(X, peter) X1=X, Yl=peter Choose for unification the fact child(jon, peter) (nr 3) We get as resolvent the empty clause (contradiction) X=jon Thusdesc(X, peter) for any X desc(jon, peter) is X=jon is a solution Continue for other Solutions We restart with the negated goal: -idesc(X, peter) We restart with the negated goal: -idesc(X, peter) We unify with rule 2 (renaming variables again): desc(X2, Z2) V child(X2, Y2) V desc(Y2, Z2) We get: -i child(X, Y2) V-idesc(Y2, peter) X2=X, Z2=peter We restart with the negated goal: -idesc(X, peter) We unify with rule 2 (renaming variables again): desc(X2, Z2) V child(X2, Y2) V desc(Y2, Z2) We get: -i child(X, Y2) V-idesc(Y2, peter) X2=X, Z2=peter We unify with child(anna, jon) (nr 3) X=anna, Y2=jon We get as resolvent -i dese(jon, peter) We restart with the negated goal: -idesc(X, peter) We unify with rule 2 (renaming variables again): desc(X2, Z2) V child(X2, Y2) V desc(Y2, Z2) We get: -i child(X, Y2) V-idesc(Y2, peter) X2=X, Z2=peter We unify with child(anna, jon) (nr 3) X=anna, Y2=jon We get as resolvent -i dese(jon, peter) We’ve already seen dese (petre, vasile) =^- leads to empty clause X=anna is another solution for initial question We restart with the negated goal: -idesc(X, peter) We unify with rule 2 (renaming variables again): desc(X2, Z2) V child(X2, Y2) V desc(Y2, Z2) We get: -i child(X, Y2) V-idesc(Y2, peter) X2=X, Z2=peter We unify with child(anna, jon) (nr 3) X=anna, Y2=jon We get as resolvent -i dese(jon, peter) We’ve already seen dese (petre, vasile) leads to empty clause X=anna is another solution for initial question if goal has variables, Prolog searches for all unifications substitutions With no variables, determines if predicate is true Use constant and binary function c ( ) to model lists Model л-ary with n + 1-ary (between args and result) Model tail-recursive caii using same variable in the result position rev3(nil, R, R) rev3(c(H, T), Ac, R) rev3(T, c(H, Ac), R) rev(L, R) rev3(L, nil, R) With goal rev(c(l, c(2, c(3, n 7)))),X) we get X = c(3, c(2, c(l, nil))) Derivation: rev(c(l, c(2, c(3, n 7))),X) Ll=c(l,c(2,c(3,nil))), R1=X rev3(c(l, c(2, c(3, n 7))), nil, X) rev3(c(2, c(3, nil)), c(l, nil), X) rev3(c(3, nil), c(2, c(l, nil)), X) rev3(nil c(3, c(2, c(l, nil))), X) Hl=l, Tl=c(2,c(3,nil)), Acl=nil H2=2, T2=c(3,nil), Ac2=c(l,nil) H3=3, T3=nil, Ac3=c(2,c(l,nil)) X=c(3,c(2,c(l,nil))) 23 November 2016 Testing Object-Oriented Software 2 Each creates a for inherited features: => correciness of superclass does not guarantee that of subclass Q: Do superclass methods work correctly within context of subclass ? For в inheriting method m from a, we should know: 1 can we completely skip re-testing в m ? 2 are the test cases for A m enough ? 3 or do we need new test cases ? which ? Testing Object-Oriented Software 3 subclass can be used anywhere instead of superclass pre(m, Class) => pre(m, SubClass) post(m, SubClass) => post(m, Class) inv(SubClass) => inv(Class) But: we must invariants to check them At the minimum, we analyze Testing Object-Oriented Software Rectangle { height; getArea() { Sguare setHeight( setWidth( 4 width; value) { height = value; } value) { width = value; } height * width; } Rectangle { value) { setHeight(value); setWidth(value); } value) { setWidth(value); setHeight(value); } Testing Object-Oriented Software 5 interactions between and are complex Are there undesired interactions between methods ? and dynamic binding increase number of execution paths make static analysis more difficult void foo(A obj) { obj m(); } could caii method m for any subclass of a Encapsulation limits state when testing increases potential for misunderstanding and error Control of more likely due to many small components is difficult: throughout program Testing Object-Oriented Software 6 [McGregor&Sykes] Due to fundamental language constructs information hiding harderto observe state in testing have persistent state => inconsistency can cause errors later have a lifetime => errors when constructed destructed at wrong time important for testing object interactions may be called in improper object state have parameters (used updated): are those in the right state? do they correclty implement their interfaces? (subtyping errors) Testing Object-Oriented Software 7 = behavioral specification Preconditions for correct behavior may be handled in two ways: contract-based: assumed   defensive programming: checked => influences complexity of implementation and testing simplifies complicates class integration testing Note: defensive programming should also check results! (although in practice, often receiver is considered trustworthy, only caller not) specification: method pre postconditions, class invariants tested! Specification must also be validated i implementation: error opportunities through Constructors destructors (incorrect initialization deallocation) inter-class collaboration: members or object parameters may have errors Does a client have the means to check preconditions? (hidden state?) Testing Object-Oriented Software 8 May propagate errors to descendants => stop through timely testing Typical OO code style: short methods, little processing, many calls => code decision coverage loses relevance Offers a mechanism for test reuse, from super- to subclass Testing may detect inheritance just for code reuse without inheriting specification Testing must check observing the substitution principie From the perspective of in program testing: Subclass keeps all observable States and transitions among them May add transitions (supplementary behavior) May add observable States (sub-states of initial ones) : difficulty of understanding testing sequence of calls => likely error: caii wrong method implementation from hierarchy in class hierachy reflected in tests (general —> specific) Testing Object-Oriented Software 9 [Weyuker ’86,’88], reformulated for OO by [Perry & Kaiser ’90] Different implementations to same functionality need different tests 1) A redefined method needs other more tests (depending on code) 2) The same method when inherited needs different class-based tests e g : A: +m(), +n() B: +m () C: +n() m calls n () => c: : m inherits в: : m but calls another n () => different tests! A test set adequate for a program need not be adequate for one of its components (it could be exercised in a different context to that program) => Adequate testing for a client is insufficient for a library (client could use only part of the functionality) => if deriving from a tested class, must still test inherited methods (code added may interact with the state => with inherited methods) Testing Object-Oriented Software 10 A test set adequate for components need not be adequate for their combination brief argument for sequential combination: p program paths in P and q paths Q^p-q>p- -q paths F; Q even more when execution alternates between P and Q => Unit module testing cannot replace integration testing! => A method tested in the base class is not tested sufficiently in the derived class (it may be composed in different ways) Programs with the same control flow but different operations values need different test suites Testing Object-Oriented Software Set class with methods: add (element)    precondition: element not in set    raise Duplicate exception otherwise remove(element) Testing: two consecutive add(x) raise exception but element might still be added a second time error discovered only with 2 x add, 2 x remove harder to test than with directly observable object state Testing Object-Oriented Software 12 Problem: implementing a class requires understanding details and representation conditions of all base classes to be sure of correct implementation Two main classes of problems: 1) initialization forgetting correct initialization of superclass 2) forgetting redefinition of method accounting for class specifics COpy methods ОГ isEqual Testing Object-Oriented Software 13 Q: what are relevant object method combinations to consider ? : all callable method implementations : all possible receiver classes Example [Rountev, Milanova, Ryder 2004] class A { public void m() { } } class В extends A { public void m() { } } class C extends A { } A a; • • • a m () ; target-methods: test calls to la A m (), в m () receiver-classes (more comprehensive): test a of type А, В, C Testing Object-Oriented Software 14 Deriv used inconsistently also as Base e g : Stack (access at one end) derived from Vector (indexed access) using Vector: : removeAt (idx) on Stack violates class invariant Cause: design error Detection: test class invariants Testing Object-Oriented Software 15 : intra- and inter-method, intra- and inter-class (caused by encapsulation): explicit flattening of class hierarchy better: allowning data access by testing framework or: use getter methods to access state : tests need to instantiate all possible subtypes for an object de-clared as a base type static analysis to find all possibilities (class hierarchy analysis) testing Data and changed state are important; line branch coverage gives little info on small method bodies : defined by def-use pairs b w methods i e a member defined(written) in m1() and used(read) by m2() used to select methods that are tested together Testing Object-Oriented Software 16 Distinguish: tests starting from or (code) S: new tests for old methods, when specification changes S: new postconditions invariants for old tests in derived classes i: new tests for new methods, depending on desired coverage Example: Change m () in superclass: re-test m () + dependent methods; re-test m () in context of subclass Change subclass: retest inherited methods that could interact Overwrite m (): augment tests of Base: :m for adequate coverage Overwrite m () used by Base : : n: test n in subclass Change of interface (abstract class): re-test whole hierarchy Testing Object-Oriented Software At level Category Partition (l O analysis, partitioning equivalence) Combinational Function Test (condition coverage) Recursive Function Test Polymorphic Message Test (client of a polymorphic serve At level invariant Boundaries Nonmodal Class Test (class w o sequencing constraints) Modal Class Test (class with sequencing constraints) Quasi-Modal Class Test (constraints dependent on state) For Abstract Class Test (interface) Generic Class Test (parameterized) New Framework Test Popular Framework Test (changes in an APi) Testing Object-Oriented Software 18 For a virtual method caii (in a client), test all possible classes to which the caii could be made Need to deal with   potential errors: - incorrect preconditions on caii for some subclasses - caii to unintended class (reference to unintended type) - change of class hierarchy (affects code tests) Dynamic binding is similar to (multi-way) branch in code => covering all instances   branch coverage Testing Object-Oriented Software 19 Nonmodal class: accepts any method caii in any state e g DateTime accepts any sequence of get set (use def) Types of test behavior - define-operation: set to valid input   check answer - define-exception: set to invalid input   check answer - define-exception-corruption: state not corrupt after exception - use-exception-test: normal return after use - use-correct-return: return with correct value after use - use-corruption: object not corrupt after use Testing Object-Oriented Software 20 class with fixed constraints on operation order create a model with object state and transitions between them Problems: - missing transition: an operation is rejected in a valid state - incorrect action   response for a method in a given state - invalid resulting state: method causes transition to wrong state - corrupt resulting state - message accepted when it should be rejected method order constraints change depending on state e g container   collection classes (full empty), etc Typically, we’d like N+ coverage (any method in any state) Testing Object-Oriented Software 21 approach - write class, write tests, run (no other details intermediate steps) - good for simple classes in stable contexts approach - run object from creation to destruction through all methods - constructors - accesors (get) - predicates - modifiers (set) - iterators - destructors December 15, 2005 - Verification of Java programs (Pathfinder, ESC Java, Bandera) - Proof Carrying Code - Combinations with static analysis Formal verification Supplement Marius Minea Verification of programs in practice 2 early effort to verify code written in usual programming languages Java PathFinder 1 0: translation from Java to PROMELA (Spin) - language similarities: treatment of dynamic object creation, threads - missing aspects (floating point); Spin needs complete model source Java PathFinder 2 0: standalone verifier, written in Java [G Brat, K Havelund, S Park, W Visser ’OO] General architecture: - usual technique for representing large States (structures): each structural value stored only once and encoded as integer for model checking, with - exploration algorithms (issue forward backward steps in own JVM) - nondeterministic environment using special methods captured by JVM Formal verification Supplement Marius Minea Verification of programs in practice 3 Verification techniques: - for constructing program abstractions (by slicing) - for identifying partial order reduction conditions using the SVC (Stanford Validity Checker) theorem prover , for detecting potential error conditions: - race conditions in access to shared variables - accessing semaphores in different order (potential deadlock) Performance and verified systems - coded in Java, lOx slower than Spin; speed: thousands of states sec - verified a control agent for state space operation - a fragment of a distributed operating system (14 classes, 1 kloc) Now a SourceForge project: http:  javapathfinder sourceforge net Formal verification Supplement Marius Minea Verification of programs in practice 4 Source-level transformations generate another Java program that op-erates on abstract predicates Abstractions is expressed as special annotation class Abstract; Abstract remove(x)    abstracts away x Abstract addBoolean("xO", x == 0)    adds predicate x == 0 One can also abstract away predicates over several classes: Abstract addBoolean("xGTy", A x > B y); - generates a predicate for each object pair instantiated from classes A and В - possible explosion in the number of needed predicates Formal verification Supplement Marius Minea Verification of programs in practice 5 ESC = Extended Static Checking; initially for Modula 3, then Java - not a model checker, but a static analyzer - can detect errors such as nuli references, out-of-bound indices - for more complex properties are used (invariants, preconditions, postconditions, null non-null conditions) - allows modular verification, separately for each method - verification done using a theorem prover (Simplify) - modules with unavailable source supplanted by specification files Similar static analyzers exist for C (lint, evolved into splint) Formal verification Supplement Marius Minea Verification of programs in practice 6 A modular verifier for Java programs - o front-end for program simplification (program slicing) - a library of frequently-used abstractions (the user specifies for each variable the desired abstraction) - ability to restrict the model to a small number of module instances - a generator for a finite model in a generic format (guarded command language, easily translated to other specifications) - interfaces with usual verifiers (SMV, Spin, PVS theorem prover) - a specification language, and support for formulas based on patterns Formal verification Supplement Marius Minea Verification of programs in practice 7 [Necula & Lee '96, Necula ’97] - A method for safe execution of untrusted code, e g net applets - code consumer defines a set of safety rules - code producer delivers the code coupled with a formal proof that it satisfies the safety rules - consumer uses a simple proof checker to establish validity of code and received proof Key idea: checking a proof is a simple mechanical task (simple checker; small, verifiable trusted code base) while generating a proof is hard (burden falls on code producer) Formal verification Supplement Marius Minea Verification of programs in practice 8 - certifying compiler, generates native code with annotations (e g invariants) - verification condition generator (VCGen) VC = predicate whose validity guarantees safe execution - proof generator (starting from VCGen) - a proof checker: validates correspondence between received code and proof (possibly assisted by a VCGen identical to producer’s) Formal verification Supplement Marius Minea Verification of programs in practice 9 - The notion of VC: linked to the rules for program correciness based on preconditions and postconditions established by Floyd (1967) - a VC is not necessarily the weakest precondition (can be simpler, easier to express and prove) for (i = 0; i 0 premise i > l s : int postcondition i simple for the consumer, who need only trust proof checker - Method is tamper-proof: no change in code and or proof can go undetected (if a changed program checks, it is still safe) - Verification is performed once, statically; allows subsequent safe execution without inserting run-time checks - Allows to trust the compiler completely and even to debug it during the development process Formal verification Supplement Marius Minea Verification of programs in practice 11 [Necula et al ’Ol]: C programe correct from typing point of view - combination of type inference and runtime checking - type inference used to establish as much as possible of the code as being type-safe - run-time checks are inserted in the rest of the source to ensure correciness of memory access - basic idea: extending the type system with pointers qualified as safe (just dereferenced), seq (for arrays) and dynamic (any access) - additional fields (base and length) are introduced for pointers which are not safe - slowdown factor: 1-2 (compared to 10-100 for Purify) - analysis also allows detection of errors Formal verification Supplement Marius Minea Verification of programs in practice 12 Context: lightweight and semi-formal methods: sacrifice partofsound-ness completeness guarantees to increase practicai applicability Metacompilation [Engler et al ’OO]: for bugs in operating system code - allows to check well-defined high-level (APi) rules, (e g , "variable x is protected by semaphore s", "interrupts must be reenabled") => by defining a meta-semantics accessible to the compiler - rules are specified as automata which transition while analyzing a relevant pattern of source code - an augmented C compilet applies these extensions to semantic analysis (propagation through the control flow graph, locally or globally) - results: hundreds of errors in Linux, OpenBSD, Exokernel, etc Related approach: automatic extraction of models from source code - by slicing (also for operating systems) - models are then analyzed by a model checker Formal verification Supplement Marius Minea Verification of programs in practice Verification of programs in practice December 15, 2005 -Verification of Java programs (Pathfinder, ESC Java, Bandera) - Proof Carrying Code - Combinations with static analysis Formal verification Supplement Marius Minea early effort to verify code written in usual programming languages Java PathFinder 1 0: translation from Java to PROMELA (Spin) - language similarities: treatment of dynamic object creation, threads - missing aspects (floating point); Spin needs complete model source Java PathFinder 2 0: standalone verifier, written in Java [G Brat, K Havelund, S Park, W Visser ’OO] General architecture: - usual technique for representing large States (structures): each structural value stored only once and encoded as integer for model checking, with - exploration algorithms (issue forward backward steps in own JVM) - nondeterministic environment using special methods captured by JVM Formal verification Supplement Marius Minea Verification techniques: - for constructing program abstractions (by slicing) - for identifying partial order reduction conditions using the SVC (Stanford Validity Checker) theorem prover , for detecting potential error conditions: - race conditions in access to shared variables - accessing semaphores in different order (potential deadlock) Performance and verified systems - coded in Java, lOx slower than Spin; speed: thousands of states sec - verified a control agent for state space operation - a fragment of a distributed operating system (14 classes, 1 kloc) NOW a SourceForge project: http:  javapathfinder sourceforge net Formal verification Supplement Marius Minea Verification of programs in practice 4 Verification of programs in practice Verification of programs in practice Source-level transformations generate another Java program that op-erates on abstract predicates Abstractions is expressed as special annotation class Abstract; Abstract remove(x)    abstracts away x Abstract addBoolean("xO", x == 0)    adds predicate x == 0 One can also abstract away predicates over seve ral classes: Abstract addBoolean("xGTy", A x > B y); - generates a predicate for each object pair instantiated from classes A and В - possible explosion in the number of needed predicates ESC = Extended Static Checking; initially for Modula 3, then Java - not a model checker, but a static analyzer - can detect errors such as nuli references, out-of-bound indices - for more complex properties are used (invariants, preconditions, postconditions, null non-null conditions) - allows modular verification, separately for each method - verification done using a theorem prover (Simplify) - modules with unavailable source supplanted by specification files Similar static analyzers exist for C (lint, evolved into splint) A modular verifier for Java programs - o front-end for program simplification (program slicing) - a library of frequently-used abstractions (the user specifies for each variable the desired abstraction) - ability to restrict the model to a small number of module instances - a generator for a finite model in a generic format (guarded command language, easily translated to other specifications) - interfaces with usual verifiers (SMV, Spin, PVS theorem prover) - a specification language, and support for formulas based on patterns Formal verification Supplement Marius Minea Formal verification Supplement Marius Minea Formal verification Supplement Marius Minea Verification of programs in practice Verification of programs in practice Verification of programs in practice [Necula & Lee '96, Necula '97] - A method for safe execution of untrusted code, e g net applets - code consumer defines a set of safety rules - code producer delivers the code coupled with a formal proof that it satisfies the safety rules - consumer uses a simple proof checker to establish validity of code and received proof Key idea: checking a proof is a simple mechanical task (simple checker; small, verifiable trusted code base) while generating a proof is hard (burden falls on code producer) - certifying compiler, generates native code with annotations (e g invariants) - verification condition generator (VCGen) VC = predicate whose validity guarantees safe execution - proof generator (starting from VCGen) - a proof checker: validates correspondence between received code and proof (possibly assisted by a VCGen identical to producer’s) - The notion of VC: linked to the rules for program correciness based on preconditions and postconditions established by Floyd (1967) - a VC is not necessarily the weakest precondition (can be simpler, easier to express and prove) for (i = 0; i Q premise i > l s int postcondition i S3ferd(mem, а T г) Л i T 1 simple for the consumer, who need only trust proof checker - Method is tamper-proof: no change in code and or proof can go undetected (if a changed program checks, it is still safe) - Verification is performed once, statically; allows subsequent safe execution without inserting run-time checks - Allows to trust the compiler completely and even to debug it during the development process Formal verification Supplement Marius Minea [Necula et al '01]: C programe correct from typing point of view - combination of type inference and runtime checking - type inference used to establish as much as possible of the code as being type-safe - run-time checks are inserted in the rest of the source to ensure correciness of memory access - basic idea: extending the type system with pointers qualified as safe (just dereferenced), seq (for arrays) and dynamic (any access) - additional fields (base and length) are introduced for pointers which are not safe - slowdown factor: 1-2 (compared to 10-100 for Purify) - analysis also allows detection of errors Formal verification Supplement Marius Minea Context: lightweight and semi-formal methods: sacrifice partofsound-ness completeness guarantees to increase practicai applicability Metacompilation [Engleret al '00]: for bugs in operating system code - allows to check well-defined high-level (APi) rules, (e g , "variable x is protected by semaphore s", "interrupts must be reenabled") => by defining a meta-semantics accessible to the compiler - rules are specified as automata which transition while analyzing a relevant pattern of source code - an augmented C compilet applies these extensions to semantic analysis (propagation through the control flow graph, locally or globally) - results: hundreds of errors in Linux, OpenBSD, Exokernel, etc Related approach: automatic extraction of models from source code - by slicing (also for operating systems) - models are then analyzed by a model checker Formal verification Supplement Marius Minea Marius Minea marius@cs upt ro 22 November 2016 When the language us to: (memory blocks) cannot be passed   returned from functions only their (array name is its address) addresses carry information => must pass size parameter : a string (constant or not) is a char * need not pass size, since null-terminated : a function name is its address When we want a level of indirection: changing value at a pointer is visible to all who have the pointer (like web URL vs page content) When a function needs to modify variable passed from outside pass of variable Functions their arguments => any pointer passed to a function must be valid (point to allocated memory) if a function needs arrays only for one can use (since C99) array of n elements, n known at runtime: [n] ; But, if the function has an array result, array must be and (including length, function has no way of knowing it!) see examples: add two vectors, multiply two matrices The more flexible the inputs, the higher the concatenate array of strings - caller must precompute length multiply two bignums - caller must compute size of product also, function is less natural (has address of as => would like called function to be able to result object allows us to obtain (functions from ) a memory block of the desired size * * (size t ); allocates size bytes (size t , size t ); n*size bytes set to 0 Return value: address of allocated memory or (insufficient memory) on error Frequent use: dynamically allocate array of n objects of type T *p = malloc(n * (T)); (P) ( = 0; i BLOCK 64 *getline( ) { *tmp, *s = NULL; = O, size = 0; ( ; (c = getcharO) != EOF; ) { (cnt >= size) (!(tmp = realloc(s, (size+=BLOCK)+l))) { ungetc(c, stdin); ; } s = tmp; s[cnt++] = c; (c == ) ; (s) { s[cnt++] = ; s = realloc(s, cnt); } s; iNCR 64 *getline( ) *line = NULL, *tmp; , sz = 0; (!(tmp = realloc(line, sz + iNCR))) line; line = tmp; (ifgetsQine + sz, iNCR, stdin)) (sz) ; { free(line); NULL; } sz += (len = strlen(line + sz)); } (line[sz-l] != && len == iNCR-1); reallocdine, sz + 1); * = malloc(LiN * COL * (elemtype)); but what is the right type of the pointer for use as matrix? A matrix is an array of lines A line is an array of COL elements By writing ; (line is now a type name) we see that the type of a pointer to a is (*) So for a pointer to a matrix (i e , to its first line), we should write: (* ) = malloc(3 * 5 * ( )); We could also write line *pm = How to declare a function that returns such a type? (* ( , ))[] { (* ) [col] = mallocQin * col * ( )); ( = 0; i must pass size parameter : a string (constant or not) is a char * need not pass size, since null-terminated : a function name is its address When we want a level of indirection: changing value at a pointer is visible to all who have the pointer (like web URL vs page content) When a function needs to modify variable passed from outside pass of variable Functions their arguments => any pointer passed to a function must be valid (point to allocated memory) if a function needs arrays only for one can use (since C99) array of n elements, n known at runtime: [n] ; But, if the function has an array result, array must be and (including length, function has no way of knowing it!) see examples: add two vectors, multiply two matrices The more flexible the inputs, the higher the concatenate array of strings - caller must precompute length multiply two bignums - caller must compute size of product also, function is less natural (has address of as => would like called function to be able to result object allows us to obtain (functions from ) a memory block of the desired size * * (size t ); allocates size bytes (size t , size t ); n*size bytes set to 0 Return value: address of allocated memory or (insufficient memory) on error Frequent use: dynamically allocate array of n objects of type T *p = malloc(n * (T)); (P) ( = 0; i BLOCK 64 *getline( ) { *tmp, *s = NULL; = O, size = 0; ( ; (c = getcharO) != EOF; ) { (cnt >= size) (!(tmp = realloc(s, (size+=BLOCK)+l))) { ungetc(c, stdin); ; } s = tmp; s[cnt++] = c; (c == ) ; (s) { s[cnt++] = ; s = realloc(s, cnt); } s; iNC 64 *getline( ) { *line = NULL; = 0; *tmp = realloc(line, (sz += iNC)+1); (tmp) line = tmp; line; line[sz-l] = 0; (!fgets(line + sz-iNC, iNC+1, stdin)) (sz > iNC) ; { free(line); NULL; } } (line[sz-l] && line[sz-l] != ); sz -= iNC; reallocQine, sz + strlen(line+sz) + 1); * = malloc(LiN * COL * (elemtype)); but what is type do we need to use it as matrix pm[i] [j] ? A matrix is an array of lines A line is an array of COL elements By writing ; (line is now a type name) we see that the type of a pointer to a is (*) So we could write line *pm = or directly (* ) = malloc(3 * 5 * ( )); How to declare a function that returns such a type? (* ( , ))[] { (* ) [col] = mallocQin * col * ( )); (pm) ( = 0; i ( , *argv[]) { * ; buf ; (argc == 2 && (f = fopen(argv[l] , ))) { (fgets(buf, (buf), f)) fputs(buf, stdout); fclose(f); are files with human-readable content: txt files, programs c, c++, web pages html, xml files, etc contain grouped in terminated by  n Conversions may occur in reading writing text streams e g end of line is  r n in Windows vs  n in Unix Standard guarantees one-to-one correspondence if: text contains only printable chars, tab and newline no newline is immediately preceded by spaces last character is a newline are not human-readable as character sequences: exe, mp3, though they may contain text: doc, pdf record internai data as-is The sequence of characters read is as was written => Any (text) file may also be opened as binary stream : open for reading (file must exist) : open for writing (truncated to length 0 if existing, else created) : open for appending (writing at end of file; created if inexistent) writes go to end-of-file, regardless of using fseek First character (r, w, a) of opening mode may be followed by: (r+, w+, a+): open as stated, but can use for input output must position (fseek) for write after read, unless EOF must position or fflush for read after write a+: initial read position implementation-defined (glibc: at start) : opens binary file (otherwise: text; no explicit text mode) : (eXclusive) may be last char only in w mode file must not exist; no shared access allowed (if system support) Examples: rb+ (read write, binary), wx, wb+x, a+x, etc * ( *pathname, *mode) arg 1: (absolute or relative to current directory) arg 2: with : r, w, or a; optionally +, b, x * = fopen( , );   fixed name, avoid * = fopen(argv , );    second arg in command line name ; (scanf( , name) == 1) { * = fopen(name, ); ( if) { } fopen returns NULL on error ( ) Otherwise, returned value (a FiLE *) work with (logical), not with name (physical) ( * ) Writes any buffered data to disk, closes file Returns 0 on success, EOF on error (teii user if save of precious data failed!) : standard input stream (default: from keyboard) getchar, scanf, etc read from here : standard output stream (default: to screen) putchar, printf, puts write here : standard error stream (default: to screen) These streams are automatically open when program runs Write error messages to stderr, separate from output (results)! From command line: can standard streams to files, input: program out txt (willwritetoout txt) both: program out txt Can also redirect from within program (with Remember: can run command from C with (in stdlib h) -based ( , * ( * ) ( , * ) ) -based (one text line) ( *S, * ) ( *s) *fgets( *s, > * ) 1 0 (same as printf scanf, from file in first arg) ( * ( * *format, ) *format, ) Typical sequence for working with files (name on command line) ( , *argv □) (argc != 2) { fprintf(stderr, ) 1; * = fopen(argv , ); (!fp) { perror( ); errno; } (fclose(fp)) { perror( 0; errno; } ( * ) nonzero if at EOF ( * ) nonzero if file had errors Do loop while ! f eof (f) : EOF is when end, only when trying to read it loop ; if not, check f eof (f) or f error (f) global variable declared in errno h contains code of last error in a library function (illegal operation, file not found, not enough memory, etc ) Function ( *s) from stdio h prints user message s, a colon : and then the error description (same as given by *strerror( ) from string h) Read write bytes as-is, without conversion, from to binary streams size t ( * , size t , size t , * size t ( * , size t , size t , * read write to from address ptr nmemb objects of size bytes each just like repeated calls to fgetc fputc Return value: of complete objects read written if smaller than requested, find reason from feof and ferror Use fread fwrite if same in memory and in file (as specified in does for file format: bmp, jpg, zip etc ) , most significant byte first: 0xcafebabe=0xca0xfe OxbaOxbe , least significant byte first: intel x86 (Oxbe OxbaOxfe Oxca) Otherwise, read write number byte by byte, (de)compose in needed order Reading and writing use the same ( * ) returns position from start of file ( * , ) Sets file position indicator to offset; 3rd arg is reference point: Start (SEEK SET), current point (SEEK CUR), end(SEEK END) ( * ) sets file position indicator to start same as fseek(stream, OL, SEEK SET); clearerr(stream); Use (re)positioning to skip parts of the file on reading, or to write a selected part MUST use fseek fflush when switching between read and write! Positioning may not be possible in any file (e g stdin stdout) ( * ) writes unwritten data buffers for the given file Files (and standard input) contain (the is to distinguish it from any char!) chars read by getchar or gete are , EOF is -1 variable read with getchar getc must be int so it can fit either scanf, fgets, fread read arrays of need no int, since they report end-of-file differently EOF can never be in an array read (since it’s ) char be signed if reading char as int, compare to int: OxFF, OxDA, etc or if declaring unsigned char buf [] if declared as char, compare with char: , , etc Marius Minea marius@cs upt ro 4 December 2017 A is а data resource on persistent storage (e g disk) File contents are typically sequences of bytes A is a program’s view (logical view) of a file, also as sequence of characters (bytes) a communication "channel" between program and outside world So far, we’ve used , and streams fgets( ,stdin), printf, puts, default: from keyboard default: to screen all three have type * default: to screen different logical purpose (results vs errors) These are automatically open when program runs  program out txt in txt stdin program stdout out txt Can standard streams to files without change, programs doing "usual" l O work with files! input: program out txt willwritetoout txt both: program out txt stderr: program 2> err txt (2 is stderr descriptor) Remember: can run command from C with (stdlib h) fwrite, fprintf, fread, fscanf, fputc, fputs, FiLE *fo fi=fopen("in txt", "r") out txt fo=fopen("out txt", "w") To work with files, a program must associate a stream with a file, by the file C uses the type * to represent streams work with the stream ( *) just like with stdin   stdout using the fromstdio h the file That’s all we need to work with files! File name is lst commandline argument (check that argc is 2) ( , *argv[]) { * ; buf ; (argc == 2 && (f = fopen(argv[l] , ))) { (fgets(buf, (buf), f)) fputs(buf, stdout); fclose(f); are files with human-readable content: txt files, programs c, c++, web pages html, xml files, etc contain grouped in terminated by  n may occur in reading writing text streams e g is  r n in Windows vs  n in Unix C standard guarantees one-to-one correspondence if: text contains only printable chars, tab and newline no newline is immediately preceded by spaces last character is a newline are not human-readable as character sequences: exe, mp3, though they may contain text: doc, pdf record data as-is The sequence of characters read is as was written Any file (including text) may be opened as binary stream : open for reading (file must exist) : open for writing (truncated to length 0 if existing, else created) : open for appending (writing at end of file; created if inexistent) writes go to end-of-file, regardless of using fseek First character (r, w, a) of opening mode may be followed by: (r+, w+, a+): open as stated, but can use for input output to write after a read, must set position (fseek), unless EOF to read after a write, must set position (fseek) or fflush a+: initial read position implementation-defined (glibc: at start) : opens binary file (otherwise: text; no explicit text mode) : (eXclusive) may be last char only in w mode file must not exist; no shared access allowed (if system support) Examples: rb+ (read write, binary), wx, wb+x, a+x, etc * ( *pathname, *mode) arg 1: (absolute or relative to current directory) arg 2: with : r, w, or a; optionally +, b, x * = fopen( , );   fixed name, avoid * = fopen(argv , ) ;    2nd arg, check argc>=3 first name ; (scanf( , name) == 1) { * = fopen(name, ); (if) { } Returns a * (a stream) used by returns NULL on error ( ) ( * ) Writes any buffered data to disk, closes file Returns 0 on success, EOF on error (teii user if save of precious data failed!) -based ( , * ( * ) ( , * ) ) -based (one text line) ( *S, * ) ( *s) *fgets( *s, > * ) 1 0 (same as printf scanf, from file in first arg) ( * , *format, ) ( * , *format, ) Typical sequence for working with files (name on command line) ( , *argv □) (argc != 2) { fprintf(stderr, ) 1; * = fopen(argv , ); (!fp) { perror( ); errno; } (fclose(fp)) { perror( 0; errno; } ( * ) nonzero if at EOF ( * ) nonzero if file had errors Do loop while ! f eof (f) : EOF is when end, only when trying to read it loop ; if not, check f eof (f) or f error (f) global variable declared in errno h contains code of last error in a library function (illegal operation, file not found, not enough memory, etc ) ( *s) function from stdio h prints user message s, a colon : and then the error description (same as given by *strerror( ) from string h) Read write bytes as-is, without conversion, from to binary streams size t ( * , size t , size t , * size t ( * , size t , size t , * read write to from address ptr nmemb objects of size bytes each just like repeated calls to fgetc fputc Return value: of complete objects read written if smaller than requested, find reason from feof and f error Use fread fwrite if same in memory and in file (as specified in does for file format: bmp, jpg, zip etc ) , most significant byte first: 0xcafebabe=0xca0xfe OxbaOxbe , least significant byte first: intel x86 (Oxbe OxbaOxfe Oxca) Otherwise, read write number byte by byte, (de)compose in needed order Reading and writing use the same ( * ) returns position from start of file ( * , ) Sets file position indicator to offset; 3rd arg is reference point: Start (SEEK SET), current point (SEEK CUR), end(SEEK END) ( * ) sets file position indicator to start same as fseek(stream, OL, SEEK SET); clearerr(stream); Use (re)positioning to skip parts of the file on reading, or to write a selected part MUST use fseek fflush when switching between read and write! Positioning may not be possible in any file (e g stdin stdout) ( * ) writes unwritten data buffers for the given file Files (and standard input) contain (the is to distinguish it from any char!) chars read by getchar or gete are , EOF is -1 variable read w  getchar getc must be so it can fit either scanf, fgets, fread read arrays of need no , since they report end-of-file differently EOF can never be in an array read (since it’s ) be signed if reading char as , compare to an int: OxFF, OxDA, etc or if declaring unsigned char buf [] if declared as , compare with a char: , , etc Preprocessing is done actual compilation: cpp or gcc -E NAME replacement LEN 20 NAME(argl, ,argn) replacement MAX(a,b) ((a)>(b)?a:b) NAME(argl,arg2, ) replacement can use VA ARGS to refer to extra arguments : used in conditional compilation NEEDS MATH H SOME DEFiNED NAME Macros are NOT variables The are like find-replace in a text, actual compiler never sees macros, just code after replacement with macros! Place args and body in parantheses (avoid precedence errors) SQR(a) ((a)*(a)) code might have:  SQR(2+3)  ((2+3)* (2+3)) sets of parantheses are needed now! Don’t use macros with side-effects if arg evaluated twice: MAX(x,y) ((x) > (y) ? (x) : (y)) BAD use: MAX(++a,b) in macro replacements: arg produces string literal for tokens represented by arg x ## у produces string concatenation of tokens for x and у STR(s) #s STRSUB(s) STR(s) J0iN(x,y) SFMT(m) x ## у STRSUB(J0iN(7,m,s)) MAX 32 scanf(SFMT(MAX), s); SFMT(32) STRSUB(J0iN(7,32,s)) STR(7,32s) C preprocessor supports conditionals, using expressions only the corresponding branch of the code will be compiled BYTE ORDER ORDER BiG ENDiAN uint!6 t x = b i b search in system directories search current dir first, then system : e g to avoid multiple inclusion MYHEADER H MYHEADER H Verification of security protocols Verification of security protocols Protocol 1: (1) A B: EB(M) (2) В -> A: Ea(M) Protocol 2: (1) A -> B: EntEnMA) (2) В—> A : Ea(Ea(M')B') Attack 1: (1) A—>B: Eg(M ), inter (2) Z -> В: Eb(M') (3) В —> Z: BZ(M); Z de Attack 2: (1) Z^A- EA(EA(EA(to (2) A -> Z- Ez(Ez(EA(h (3) Z decodes EA(M)B, A: Ea(Ea(M)Z} (4) A -" Z: BZ(BZ(M)A) Formal verification Lecture 11 A fundamental problem: establishing a two-way secre channel between two entities (secret key, shared key) - shared key known only by the two participants - decryption and encryption key related by a simple t (conventionally considered as being the same) - examples: Data Encryption Standard (1975, outdat AES (RJndael) (public key) - each participant A: has a pair of keys, one is invers - publica key, Ka   private key K^* 1 - A sends A' ,(A',71(A7)): only A can create, only В c; - ex Rivest-Shamir-Adleman (1976), El-Gamal Formal verification Lecture 11 January 19, 2006 - models of security protocols - typical examples of protocols and attacks - modeling in BAN logic - verification methods Formal verification Lecture 11 Verification of security protocols Verification of security protocols Verification of security protocols intruders are "active": they can eavesdrop on the со acquiring messages and doing everything possible to An intruder: a) can obtain every message from the network b) is a legitimate user of the system; in particular, s  conversation with any user c) will have the opportunity to receive messages frorr (more generally: any user В can become recipient foi [Dolev &i Yao ’83]: stressed the importance of clea modeling, analysis and verification of protocols: 1) in a public key encryption system: a) encryption functions cannot be broken b) public key directory has guaranteed identity c) everybody has access to all public keys BY, VX d) only X has access to its decryption key Dx 2) A protocol between two entities does not require a third for encryption or decryption 3) in a uniform protocol, all communicating parties u: sage format Need for secret communication dates back to antiqui likewise the discovery of ciphers and beginnings of cr Security involves multiple aspects: authentification, authorization, integrity, confidentia tion Solutions are complex and reasoning them about the Security of a protocol must not depend on the secrecy (no "security through obscurity") — Subtle errors in existing (and widely used) protoco covered after very long time (17 years, for Lowe's attc Schroeder) => there are great risks in compromising a weak alg importance of formal verification even greater Formal verification Lecture 11 Formal verification Lecture 11 Formal verification Lecture 11 Verification of security protocols Verification of security protocols Verification of security protocols [Lowe '95] finds an error in the public key version (af (1) A -> В: A, B, {Na, A}Kli A asks to communicate (2) В A: B,A,{Na,Nb}Ka В replies with nonce ? (3) A —> B: A,B,{Ni,}i, : A confirms reception Attack with two concurrent sessions: A initiates sessio i; the latter impersonates A in session  3 with В (a l) А^Г A,i,{Na,A}K ( 01) i(A)B' A,B,{Na,A}Kh (0 2) В-> i(A): В,Д{ЛГ",ІѴ,,}ЛіІ (a 2) i—> A ' i, A,{Na, Nb}Ka (a 3) A—* *i: A,i,{Nb}K (0 3) i(A)—> B' A,B,{Nb}, A : {Nb}Kali A confirms by retransmitting a message based on Nb (conventionally, decremented by 1 to avoid replay) (5) A -> B- {Nb - 1}R-jf, Now, both participants know they can communicate Formal verification Lecture 11 Dolev Yao discuss two types of protocols, defined operations: 1 Cascade protocols — encryption with any public key - decryption only with own key 2 Name stamp protocols in addition: - appending a participant’s name to a messaje - deleting a certain participant’s name — deleting any name Correciness problem becomes a rewriting problem fo alphabet, decidable in polyomial time — but undecidable for more complex problems Formal verification Lecture 11 Verification of security protocols Verification of security protocols Verification of security protocols "Cryptography is not broken, it is circumvented" - A Clark Si Jacob, "A Survey of Authentication Protocol Literaturi • freshness attacks (replay attacks) - a message (or fragment) from an earlier communi stored an inserted by the intruder in a new session • type flaw attacks - A message is composed of fields, each with a given (data, nonce, participant name, key value) - Attack based on accepting a message with another bit pattern than the one initially sent Formal verification Lecture 11 [Denning & Sacco, 1981] Problem: an intruder who eavesdropped on a previous В to accept an old key, potentially compromised intruder i impersonates A (denoted i(A)) and sends t from the earlier session, with the old key K' (3) i(A)-tB: {Kc,A}Kl" (4) В -> i(A): {Nb}K" (5) i(A) B: {Nb - 1}a7 Danger: i has practically unlimited time to comprom Correction: timestamps or extra nonce Formal verification Lecture 11 protocols by which participants convince each other c - and either establish shared secrets (keys) for comm - or recognize the use of pariners' secret keys - are the most widely studied security protocols in th Notations: A, B: participants S: authentication servi Na,Nb "nonce" (from: number once) = random p generated to avoid reuse of old messages by an intru {X}  said the parts P^X, P^Y P H (X, У) P H Q H C* P^(X,Y) P^X P't=Q'^ P^X Pcx p^x P^-X Decryption rules: P^&Q Pc {X}A-i PCX 9V0 pi^Q  >P Bidirectionality of keys and secrets among participam P Я А Д' P |= R'  > R P^Q {X}K {X)v P believes X P sees (receives, reads) message X P said (sent) X sometime in the past P has Jurisdiction over X P is an authority on X (e g , a key) and must be believed X is fresh (has not been sent so far) P and Q can use shared key К to communi P has public key К X is a secret known only by P and Q message X encrypted with key К X combined with secret Y (for identificatio Formal verification Lecture 11 Formal verification Lecture 11 • patallel session attacks - two or more concurrent sessions of the same proto — messages from a session used to attack another • implementation-dependent attacks -type flaw attacks can be eliminated if the represent components contains redundancy to distinguish the t — interaction between protocol and encryption methoi of a bit in bitwise encryption) • binding attacks (key integrity attacks) - tampering with pariner’s public key (replacing it wit • and many others Formal verification Lecture 11 Verification of security protocols Verification of security protocols Verification of security protocols (1) A -" S: A, B, Na (2) SA: {Na,B,Kab,{Kab,A}Kli}Ka, (3) A —> B: {Kab,A}Kbt (4) В —> A: {Nb}Kab (5) A - B: {Nb - l}Kab We idealize the protocol: instead of bit messages, formulas, corresponding to message meaning: (1) Message 1 is only a request, has no logic value (2) S - A: {Na, В, (А В), #(А B), {A B}AJ (3) A —" В: {А^В}а-ь> (4) В - A: {Nb, (A B)}Kjii from В (5) A B: {Nb, (A B)}Aab from A Rules on meaning of messages: - for shared keys: P^Q^P, Pc{X}K P^Q^X - for public keys: - for shared secrets: P^Q, {%}, finite particip: Theorem provers do not have this limitation Rewriting of reasoning in Prolog: interrogator [Miller NRL Protocol Analyzer [Meadows et al ] - combination of theorem-proving + model checking - starts from an error state (should be inaccessible) - searches backwards using inductive techniques Athena [Song et al , CMU Berkeley] - representation using strand spaces based on causa vidual executions => reduces state space significantly Formal verification Lecture 11 Protocol is asynchronous composition b w participan intruder can listen to anything, and delete, change or according to its current knowledge set State space is given by execution point for each parti knowledge set of intruder (set of terms constructed i intruder is modeled by a relation i- by which the int messages m from an initial set of information Г - if те i then i -m - concatenation: if fi- mi and fi- тг then il-mi-n - projection: if f F mi   тг then fi- mi and fi- тг - encryption: if   H m and i h к then 11- {m}, - decription: if fi- {m}; and 11- then i  -m Model checkers: FDR (for CSP), OFMC (on-the-fly) based) Formal verification Lecture 11 We start with the assumptions (denote P X and Q S^A^B A,B^(SH-AaB) a |= (S H tt(A Д B)) (ag A H #(№) В |= І(М s H tt(A B ) From А ]|(Ло), and (2) we deduce: А Ё S |= A В, А Ё S |= J(A 'L? B) and from the Jurisdiction rule: А А В A j|(> After receiving message (3) from A, we deduce: В We cannot obtain В А В without the premise Б From the freshness of messages (4) and (5) we de А В and В A A *4'’ B, thus each participant that the key is valid, and that the other participant к Reasoning explicitates the missing premise, which al to substitute a compromised key Verification of security protocols Verification of security protocols Problem in model checking: representing potentially in Theory generation (RVChecker, REVERE [Kindred&V BAN logic) - a syntactic method of saturation-based theorem pr - produces a finite representation of a potentially infi (all theorems generated from some premises and rule -termination based on limiting the application of thoi which can generate conclusions of larger size than pr Combination with model checking: - a dubious premise found by theory generation: used attack - conversely, a counterexample, modeled as logic ded tify a dubious premise Formal verification Lecture 11 — allows to prove properties about a protocol — if property can’t be proved, there are serious reasoi — can identify dubious missing non-explicit premises But: — monotone logic: an existing fact cannot be retracti - cannot handle the notion of key confidentiality or promised (e g sending a key in plaintext) Formal verification Lecture 11 6 December 2017 from exploring the system from the specification from code in all cases, we need a mapping from actions and responses of the model to inputs and responses of the system under test (SUT) Example: Web Application Abstract Language [Buchler et al , KiT TU Miinchen] 1) browser actions: FoilowLink, ClickButton, Selectltems, Clicklmage, gotoURL, inputText, Move Мои se, etc 2) Mapping to actions toSUT: login(user, pwd) = selectitem(employeeList, user); inputText(passwordField, pwd); clickButton(login); 3) Mapping to actions of the testing framework (e g , Selenium): HtmlUnit findElementO , WebElement clickO informai: exploratory testing e g , model of a GUi (file editor) and generated program actions Model building: manually Conformance testing (system respects model?): automated Formal: automata learning {active learning, Angluin algorithm) generate input sequences, observing outputs if two sequences  '1,  '2 cannot be distinguished by suffixes w up to a given length ( ’iw and 12W generate same outputs), consider they lead to the same state Currently very successsful in learning   testing network protocols Example: phone switch [Kaner] You hung up Connected     On Hold hand Caller hung up Usually written by PCi Local Bus Specification, 2004: ( iDLE ) ( B BUSY ) ( TbACKOFf')        6 S DATA   (^turn ar' ^ " if a conflict exists between the specification and the state machines, the specification has precedence " iETF Extensible Authentication Protocol (EAP), FRC 4137 (2005) "Should a conflict exist between the interpretation of a state diagram and either the corresponding global transition tables or the textual description associated with the state machine, the state diagram takes precedence do {    Fragment de device driver [Ball & Rajamani ’Ol] request = devExt->WriteListHeadVa; if (request && request->status) { devExt->WriteListHeadVa = request->Next; irp = request->irp; if (request->status > 0) { irp->ioStatus Status = STATUS SUCCESS; irp->ioStatus information = request->Status; } else { irp->ioStatus Status = STATUS UNSUCCESSFUL; irp->ioStatus information = request->Status; } SmartDevFreeBlock(request); ioCompleteRequest(irp, iO NO iNCREMENT); do { А:  * b == (nPackets == nPacketsOld) *  if(*) { B: if (*) { skip; } else { skip; }  * choose(pl, p2) == pl ? T : p2 ? F : nondet *  } } Abstractions use Hoare rules   Dijkstra weakest preconditions fields, representing relations between actual object fields Each method: annotated with preconditions   postconditions   invariants, expressed in terms of http:  kindsoftware com products opensource ESCJava2  ESCTools slides ETAPSTutorial 5 more jml pdf (p 35-45) January 19, 2006 - models of security protocols - typical examples of protocols and attacks - modeling in BAN logic - verification methods Formal verification Lecture 11 Marius Minea Verification of security protocols 2 Need for secret communication dates back to antiquity likewise the discovery of ciphers and beginnings of cryptography Security involves multiple aspects: authentification, authorization, integrity, confidential ity, nonrepudia-tion Solutions are complex and reasoning them about them difficult Security of a protocol must not depend on the secrecy of the algorithm (no "security through obscurity") - Subtle errors in existing (and widely used) protocols have been dis-covered after very long time (17 years, for Lowe’s attack on Needham-Schroeder) => there are great risks in compromising a weak algorithm, unknown and thus unscrutinized by specialists => importance of formal verification even greater Formal verification Lecture 11 Marius Minea Verification of security protocols 3 A fundamental problem: establishing a two-way secret communication channel between two entities (secret key, shared key) - shared key known only by the two participants - decryption and encryption key related by a simple transform (conventionally considered as being the same) -examples: Data Encryption Standard (1975, outdated), Triple DES, AES (Rijndael) (public key) - each participant A: has a pair of keys, one is inverse of the other - publica key, Ka   private key K r - A sends К^(7С"1(М)): only A can create, only В can decrypt - ex Rivest-Shamir-Adleman (1976), El-Gamal Formal verification Lecture 11 Marius Minea Verification of security protocols 4 [Dolev & Yao ’83]: stressed the importance of clear assumptions in modeling, analysis and verification of protocols: 1) in a public key encryption system: a) encryption functions cannot be broken b) public key directory has guaranteed identity c) everybody has access to all public keys Ex, VX d) only X has access to its decryption key Dx 2) A protocol between two entities does not require the assistance of a third for encryption or decryption 3) in a uniform protocol, all communicating parties use the same message format Formal verification Lecture 11 Marius Minea Verification of security protocols 5 Protocol 1: (1) A^B- EB(M) (2) В A: EA(M) Attack 1: (1) A B: EB(M), intercepted by Z (2) Z - B: EB(M) (3) В Z: EZ(M); Z decodes M Attack 2: Protocol 2: (1) А -+ B: EB(EB(M)A) (2) В A: EA(EA(M)B) (1) Z A: Ea(Ea(Ea(M)B)Z) (2) A Z: Ez(Ez(Ea(M)B)A) (3) Z decodes EA(M}B, obtaining EA(M} (4) Z A: Ea(Ea(M)Z) (4) A Z: EZ(EZ(M)A), thus Z has M Formal verification Lecture 11 Marius Minea Verification of security protocols 6 intruders are "active": they can eavesdrop on the communication, acquiring messages and doing everything possible to decrypt them An intruder: a) can obtain every message from the network b) is a legitimate user of the system; in particular, s he can initiate a conversation with any user c) will have the opportunity to receive messages from any user (more generally: any user В can become recipient for any user A) Formal verification Lecture 11 Marius Minea Verification of security protocols 7 Dolev & Yao discuss two types of protocols, defined by their allowed operations: 1 Cascade protocols - encryption with any public key - decryption only with own key 2 Name stamp protocols in addition: - appending a participant’s name to a messaje - deleting a certain participant’s name - deleting any name Correciness problem becomes a rewriting problem for strings over an alphabet, decidable in polyomial time - but undecidable for more complex problems Formal verification Lecture 11 Marius Minea Verification of security protocols 8 protocols by which participants convince each other of their identity - and either establish shared secrets (keys) for communication - or recognize the use of pariners’ secret keys - are the most widely studied security protocols in the literature Notations: A, B: participants S: authentication server Na,Nb: "nonce" (from: number once) = random pattern (number) generated to avoid reuse of old messages by an intruder {X}K: message X cripted with key К [Needham & Schroeder ’78] "Using Encryption for Authentication in Large Networks of Computers": classic article, the first to predict the importance of formal verification methods Formal verification Lecture 11 Marius Minea Verification of security protocols 9 (1) A^S: A B Na A announces to server S the intention to communicate with В (and guarantees freshness of the message with a nonce Na) (2) S A  {Na, B, Kab, {Kab, A}Kbs}Kas S sends to A the key Kab, together with an encrypted message which A will retransmit to B: (3) A^B- {Kab,A}Kbs В extracts the key Kab and announces A by sending a nonce Nb: (4) В - А: {ЗДк " A confirms by retransmitting a message based on Nb (conventionally, decremented by 1 to avoid replay) (5) Л^В: Now, both participants know they can communicate with Kab Formal verification Lecture 11 Marius Minea Verification of security protocols 10 [Denning & Sacco, 1981] Problem: an intruder who eavesdropped on a previous session can force В to accept an old key, potentially compromised intruder i impersonates A (denoted Z(A)) and sends to В message (3) from the earlier session, with the old key Kc' (3) {Kc,A}Kbs (4) {Nb}Kc (5) {Nb-l}Kc Danger: i has practically unlimited time to compromise (break) Kc Correction: timestamps or extra nonce Formal verification Lecture 11 Marius Minea Verification of security protocols 11 [Lowe ’95] finds an error in the public key version (after 17 years!) (1) A —> B: A,B,{Na,A}x A asks to communicate, sends nonce Na (2) В —► А: B,A,{Na,Nb}Ka В replies with nonce Nb (3) A^B: A,B,{Nb}Kb A confirms reception Attack with two concurrent sessions: A initiates session a with intruder i; the latter impersonates A in session (3 with В (q 1) A—> Г A,i,{Na,A}K ( 3 1) 7(A) —► B: A,B,{Na,A}Kb ( 3 2) В i(A  B,A,{Na,Nb}Ka (q 2) 1^ A: i,A,{Na,Nb}Ka (o 3) A^i: A, ,{Nb}K ( 3 3) i(A)—> B: A,B,{Nb}Kb Discovered: with FDR model checker for CSP language Correction: including the encrypted name of the sender in message (2) Formal verification Lecture 11 Marius Minea Verification of security protocols 12 "Cryptography is not broken, it is circumvented" - A Shamir Clark & Jacob, "A Survey of Authentication Protocol Literature, ’97: • freshness attacks (replay attacks) - a message (or fragment) from an earlier communication session is stored an inserted by the intruder in a new session • type flaw attacks - A message is composed of fields, each with a given interpretation (data, nonce, participant name, key value) - Attack based on accepting a message with another interpretation of bit pattern than the one initially sent Formal verification Lecture 11 Marius Minea Verification of security protocols 13 • patallel session attacks - two or more concurrent sessions of the same protocol - messages from a session used to attack another • implementation-dependent attacks - type flaw attacks can be eliminated if the representation of message components contains redundancy to distinguish the type - interaction between protocol and encryption method (e g , changing of a bit in bitwise encryption) • binding attacks (key integrity attacks) - tampering with pariner’s public key (replacing it with intruder’s key) • and many others Formal verification Lecture 11 Marius Minea Verification of security protocols 14 [Burrows, Abadi, Needham '89: "A logic of authentication"] - most important method for modeling using logic - a logic of belief, as opposed to a logic of knowledge - deals with what every participant believes is true Goal: to express precisely - initial assumptions about the workings of a protocol - the final conclusions reached by the participants Examples: - what does the protocol achieve ? - does it need more assumptions than another protocol ? - does it send encrypt something which is not necessary ? Formal verification Lecture 11 Marius Minea Verification of security protocols 15 O) {*}к P believes X P sees (receives, reads) message X P said (sent) X sometime in the past P has jurisdiction over X P is an authority on X (e g , a key) and must be believed X is fresh (has not been sent so far) P and Q can use shared key К to communicate P has public key К X is a secret known only by P and Q message X encrypted with key К X combined with secret Y (for identification) Formal verification Lecture 11 Marius Minea Verification of security protocols Rules on meaning of - for shared keys: messages: - for public keys: - for shared secrets: Rules on freshness of messages PH Formal verification Lecture 11   Р said the parts Decryption rules: Bidirectionality of keys and secrets among participants Formal verification Lecture 11 Marius Minea Verification of security protocols 18 (1) A S: A,B,Na (2) S A: {Na,B,Kab,{Kab,A}Kbs}Kas (3) A^B: {Kab, A}Kbg (4) В A: {Nb}Kab (5) A-B: Ж-1)каЬ We idealize the protocol: instead of bit messages, we send logical formulas, corresponding to message meaning: (1) Message 1 is only a request, has no logic value (2) S - A; {Na, В, (А B), H(A В), {А B}Kbs}Kas (3) A - В: {А КЛЬ B}Kbs (4) В A: {Nb, (А КЛЬ B}}Kab from В (5) A B: {Nb, (А КЛЬ B}}Kab from A Formal verification Lecture 11 Marius Minea Verification of securitv protocols 19 We start with the assumptions (denote P X and Q X by P,Q X): А, В (S |=> A B) A (S |^> ft(A B)) (a good key is fresh) A H Ш В н И(О s н И (А кль В) From А s(Na), and (2) we deduce: A |= S |= A ^Ab В, А |= S Н tO В) and from the jurisdiction rule: А |= А В A |= (j(A B) After receiving message (3) from A, we deduce: В |= S |  A finite participants and sessions Theorem provers do not have this limitation Rewriting of reasoning in Prolog: interrogator [Millen’87] NRL Protocol Analyzer [Meadows et aL] - combination of theorem-proving + model checking - starts from an error state (should be inaccessible) - searches backwards using inductive techniques Athena [Song et aL, CMU Berkeley] - representation using strand spaces based on causality and not individual executions => reduces state space significantly Formal verification Lecture 11 Marius Minea Marius Minea marius@cs upt ro 5 December 2016 group (logically connected) elements of potentially can use assign pass return value, or of it structures are first-class values in C len { unit ; len dl = { 60, Structures correspond to set of possible values is of component types above: any real number with any 3-char string vect { > у; } vl, v2; vl vl x vl y v2 v2 x v2 y Structure elements are called of any type, but the structure type (infinite rccursion) Access fields as: var name fielcLname the dot is the postfix vect pl; pl x=2; pl y=3; printf( , pl x, pl y); Can write , with or without field names: vect vl = { 2, 3 v2 = { ,x = 4, ,y = 5 We may structures: vect vl={2, 3}, v2; v2=vl; Except for initialization, need for aggregate values: vect v3, v4; v3 = ( vect){-4, 5}; v4 = ( vect){ x = -1, ,y = 2}; Structures may be to and from functions for large structures should pass return pointers (less copying) vect add( vect vl, vect v2) { ( vect){ vl x + v2 x, vl y + v2 y We may structures with logical operators (==, !=) must compare field by field: (vl x==v2 x && vl,y==v2 y) Reason: in memory may cause spaces between fields value of hidden bytes is undetermined =^- also don’t use memcmp in C, aggregated (compound) types may be combined arbitrarily arrays of structures, structures with array or structure fields, etc Define types to E g replace two related arrays of same range by array of structures: * name mo = { day mo = { 31, 28, }; 31, 30, 30, 31 }; month { *name; }; month mo = {{ ,31}, , { ,31}}; allows us to give new names to existing types General form: typedef existing-type ; (like variable declaration + typedef in front names a ) e g (* )( vect vect t; *); We can give the name directly in the type definition student { } student t; may omit structure tag (after ) and use just new name { } student t; or separately define synonym and structure type (in either order) student { student student t; { name ; *addr; } student t; student t s; s name is : we сап сору or read a string: CANNOT assign s name = , it’s a CONSTANT address! strcpy(s name, ); (scanf( , s name) == 1) s addr is : we must assign a address e g , a string constant: s addr = ; or dynamically allocated memory: (fgets(buf, (buf), stdin) s addr = strdup(buf); Field names are only visible the structure cannot use fieldname by itself, only varname field => different structure types can have fields with same name Like any variable, a structure can be accessed through a pointer: student s, *p = &s; (*p) final grade = 9 50; The operator is shorthand for indirection followed by selection: use: pointer->fieldname means: (*pointer) fieldname Use to large structures as function arguments: avoids needless copying of data onto stack Operators and -> have the , like () and П p->x++ ++p->x *p->x *p->s++ means means means means (p->x)++ ++(p->x) *(p->x) *((p->s)++) -> has priority -> has priority -> has priority first ++ then * (right assoc ) A structure field may not be a structure of the of the structure would be undefined infinite But can have of the same type of structure (a pointer) => datastructures (lists, trees, etc ) List of words: wl { *word; wl *next; Binary tree with integer nodes: t tree t; t { > tree t *left, *right; We want compact, efficient representations but don’t use too restrictive assumptions! (see Y2K problem) date = 32-bit int: sec, min (0-59): 6 bits, hour (0-23), day (1-31): 5 bits, month (1-12): 4 bits, year (1970 + 0-63): 6 bits date { : 6, min : 6; : 5, day : 5; : 4; : 6; } data = {0, 0, 17, 19, 5, 39 We can directly write: printf( , data day, data month); Nameless fields can control space used: : 2; or force storing data starting in the next byte int: 0; Corn pi ier each data type in memory for best processor access can find out with Alignof operator printf( , Alignof( ), ( *)); Structure fields are in order but need not be in consecutive bytes of f setof (structuretype, fieldname) tells where (from stddef,h) printf( printf( { s ; val ; } sl t; { s ; ; } s2 t; , offsetof(sl t, val), (sl t)); , offsetof(s2 t, val), (s2 t)); if you define structures for easier work with certain file formats check that offsets are the same as in the file (no unused bytes) Sometimes the size of an array field is not known statically member of a structure may be an incompletely defined array *fname; []; } func t; Declaring func t f; is useless, array has length 0 (no elements) => initialize statically, pass struct as argument But, can dynamically create a structure of the desired size: and pass to struct as function argument func t *fp = malloc( (func t) + n * ( ));} (fp) { fp->argc = n; ( = 0; i args[i] = two other kinds of user-defined types declaration : with keyword + tag + braces (similar to structures) : just named integer values : declares a type which is the union of several types may contain one value of any of the types gives to integer values (constants) => use for (names are more suggestive than ints) univ mo {jan=l, feb, mar, apr, may, jun, oct=10, nov, defines type univ mo (the keyword is part of the type name) Default: increasing sequence of values, starting at 0 Can explicitly specify values (restarts count); values may repeat An enumeration type is an type => values used as ints {Su, M, Tu, W, Th, F, Sa} day t; ; ( = M; day avem int:int:rest —> int:rest obtinem un model cu efectul fiecarei instructiuni do {    Fragment de device driver [Ball & Rajamani ’Ol] request = devExt->WriteListHeadVa; if(request && request->status) { devExt->WriteListHeadVa = request->Next; irp = request->irp; if (request->status > 0) { irp->ioStatus Status = STATUS SUCCESS; irp->ioStatus information = request->Status; } else { irp->ioStatus Status = STATUS UNSUCCESSFUL; irp->ioStatus information = request->Status; } SmartDevFreeBlock(request); ioCompleteRequest(irp, i0 N0 iNCREMENT); do { А:  * b == (nPackets == nPacketsOld) *  if(*) { B: if (*) { skip; } else { skip; }  * choose(pl, p2) == pl ? T : p2 ? F : nondet *  } } Abstractizarea se face folosind reguli Hoare   preconditii Dijkstra Campuri "fictive", reprezinta relatii intre campuri reale din cod Fiecare metoda e anotata cu preconditii   postconditii   invarianti, exprimate relativ la (campurile din) model http:  kindsoftware com products opensource ESCJava2  ESCTools slides ETAPSTutorial 5 more jml pdf (p 35-45) Marius Minea marius@cs upt ro 12 December 2017 a group (logically connected) elements of possibly can use assign pass return value, or of it structures are first-class values in C len { unit ; len dl = { 60, Structures correspond to set of possible values is of component types above: any real number with any 3-char string point { > у; } рі, р2; Р1 рі х pi-y р2 р2 х р2 у 1Т 1Т Structure elements are called of any type, but the structure type (infinite rccursion) Access fields as: var name fielcLname the dot is the postfix point pl; pl x=2; pl y=3; printf( , pl x, pl y); Field names are only visible the structure cannot use fieldname by itself, only varname field different structure types can have fields with same name Can write , with or without field names: point pl = { 2, 3 p2 = { x = 4, ,y = 5 We may structures: point pl={2, 3}, p2; p2=pl; Except for initialization, need for aggregate values: point p3, p4; p3 = ( point){-4, 5}; p4 = ( point){ x = -1, у = 2}; Structures may be to and from functions for large structures should pass return pointers (less copying) point add( point pl, point p2) { ( point){ pl x + p2 x, pl y + p2 y structures with logical operators (==, !=) must compare field by field: (pl x==p2 x && pl y==p2 y) Reason: in memory may cause gaps between fields value of hidden bytes is undetermined also don’t use memcmp in C, aggregated (compound) types may be combined arbitrarily arrays of structures, structures with array or structure fields, etc Define types to E g replace two related arrays of same range by array of structures: * name mo = { , day mo = { 31, 28, 31, 30, month { *name; }; month mo = {{ }; 30, 31 }; ,31}}; ,31}, , { declares new names for existing types General form: typedef existing-type ; (like variable declaration + typedef in front names a ) e g ; point point t; (* ) ( *, *); We can give the name directly in the type definition student { } student t; may omit structure tag (after ) and use just new name { } student t; or separately define synonym and structure type (in either order) student { student student t; { name ; *addr; } student t; student t s; s name is : we сап сору or read a string: CANNOT assign s name = , it’s a CONSTANT address! strcpy(s name, ); (scanf( , s name) == 1) s addr is : we must assign a address e g , a string constant: s addr = ; or dynamically allocated memory: (fgets(buf, (buf), stdin) s addr = strdup(buf); Like any data, a structure can be accessed through a pointer: The operator is shorthand for indirection followed by selection: use: pointer->fieldname means: (*pointer) f ieldname student s, *p = &s; p->final grade = 9 50; For large structures, use as function arguments: avoids needless copying of large structure onto stack Declare arg sometype *p if function does not change value Operators and -> have the , like () and П p->x++ means (p->x)++ -> has priority ++p->x means ++(p->x) -> has priority *p->x means *(p->x) -> has priority *p->s++ means *((p->s)++) first ++ then * (right assoc ) A structure field may not be a structure of the of the structure would be undefined infinite But can have of the same type of structure (a pointer) => datastructures (lists, trees, etc ) List of words: wl { *word; wl *next; Binary tree with integer nodes: t tree t; t { > tree t *left, *right; We want compact, efficient representations but don’t use too restrictive assumptions! (see Y2K problem) date = 32-bit int: sec, min (0-59): 6 bits, hour (0-23), day (1-31): 5 bits, month (1-12): 4 bits, year (1970 + 0-63): 6 bits date { : 6, min : 6; : 5, day: 5; : 4; : 6; } data = {0, 0, 17, 19, 5, 39 We can directly write: printf( , data day, data month); Nameless fields can control space used: : 2; or force storing data starting in the next byte int: 0; Corn pi ier each data type in memory for best processor access can find out with Alignof operator printf( , Alignof( ), ( *)); Structure fields are in order but need not be in consecutive bytes of f setof (structuretype, fieldname) tells where (from stddef,h) { s ; val ; } sl t; { s ; ; } s2 t; printf( , offsetof(sl t, val), (sl t)); printf( , offsetof(s2 t, val), (s2 t)); if you define structures for work with certain file formats check that offsets are the same as in the file (no unused bytes) Sometimes the size of an array field is not known statically member of a structure may be an incompletely defined array *fname; []; } func t; Declaring func t f; is useless, array has length 0 (no elements) => initialize statically, pass struct as argument But, can dynamically create a structure of the desired size: and pass to struct as function argument func t *fp = malloc( (func t) + ( []));} (fp) { fp->argc = n; ( = 0; i args[i] = two other kinds of user-defined types declaration : with keyword + tag + braces (similar to structures) : just named integer values : declares a type which is the union of several types may contain value of of the types gives to integer values (constants) => use for (names are more suggestive than ints) univ mo {jan=l, feb, mar, apr, may, jun, oct=10, nov, defines type univ mo (the keyword is part of the type name) Default: increasing sequence of values, starting at 0 Can explicitly specify values (restarts count); values may repeat An enumeration type is an type => values used as ints {Su, M, Tu, W, Th, F, Sa} day t; ; ( = M; day  3 where a, 3 are formulas Vva with v variable, a formula: Other usual connectors: a A  3 d= -r -  3) (AND) а V 3 =  3 (OR) : 3xy Compared to propositional logic: instead of propositions, predicates over terms desc(X, Y) child(X, Y) desc(X, Z) child(X, Y), desc(Y, Z) child(anna, jon) child(jon, peter) child(eve, jon) child(peter, mary) Variables in clause head are Rest of variables in clause body are quantified quantified ѴХѴУ child(X, V) desc(X, V) VXVZ Sr(ch 7d(X, У) A desc(V, Z)) desc(X, Z) Resolution is an that produces a new clause from two clauses with complementary literals (p and -ip) p V а -ip V (3 а V (3 The new clause = of the two clauses w r t p Example: rezp{p V q V - p V  3) We use resolution to show that a formula is a resolution is a method for proof by refutation We have two formuas where a predicate may appear positive and negated: Vx Vy P(x,g(y)) and Vz ->P(z, a) or Vx Vy P(x, g(y)) and Vz -iP(a, z) Are these contradictory ? We may a universally quantified variable with term => in the second case, we may substitute x i-> a, z i-> g(y) => we obtain P(a, g(y)) and -P(a g(y)), in the first case, we may not substitute у and obtain a from g(y) interpretation: we may not assume that the arbitrary function g must also take the constant value a This is precisely defined by and A is а that associates to {xi ti, ,xn t"} For example, f(x, g(y, z), a, t){x g(y), у f(b), t u} = f(g(y),g(f(b),z),a, u) Obs: other encountered notations: Xj tj, or t, x,- Usually postfix notation Ta is used for substitutions a applied to term T The composition of two substitutions is a substitution Two terms ti and may be if there is a substitution a that makes them equal: ticr = t2tr Such a substitution is called Example: f(x, g(y)){x a} = f(a, g(y)) = f(a, z){z g(y)} i e , the substitution {x i-> a, z i-> g(y)} is a More generally: applied to a pairs of terms The is that from which any other unifier may be obtained by using another substitution in : having the clauses P( i,  2, • • •  n) and -iP(ri, Г2, • • • rn) if we find a unifier for ( 1, ri), we have a A variable х may be unified with any term t if x in t not: x with f(g(y)  ?(x z)) (substitution would lead to an infinite term) Two functional terms may be unified only if they have identical functions, and the term arguments may be pairwise unified in particular: only identical constants may be unified Prolog execution can be seen in two ways: Match goal with head of rule or fact, until no more subgoal Apply resolution with negation of goal, until empty clause Consider as goal: desc(X, peter) A = a value for X that makes the predicate true A formula is if its is a We derive a contradiction using Consider as goal: desc(X, peter) A = a value for X that makes the predicate true A formula is if its is a We derive a contradiction using Write the negated goal: -i desc(X, peter) i e , desc(X, peter) is for any X Consider as goal: desc(X, peter) A = a value for X that makes the predicate true A formula is if its is a We derive a contradiction using Write the negated goal: -i desc(X, peter) i e , desc(X, peter) is for any X Choose the first rule for unification (use fresh variables): desc(Xl, Yl) V- child(Xl, Yl) We get as resolvent -i child(X, peter) X1=X, Yl=peter Consider as goal: desc(X, peter) A = a value for X that makes the predicate true A formula is if its is a We derive a contradiction using Write the negated goal: -i desc(X, peter) i e , desc(X, peter) is for any X Choose the first rule for unification (use fresh variables): desc(Xl, Yl) V- child(Xl, Yl) We get as resolvent -i child(X, peter) X1=X, Yl=peter Choose for unification the fact child(jon, peter) (nr 3) We get as resolvent the empty clause (contradiction) X=jon Consider as goal: desc(X, peter) A = a value for X that makes the predicate true A formula is if its is a We derive a contradiction using Write the negated goal: -i desc(X, peter) i e , desc(X, peter) is for any X Choose the first rule for unification (use fresh variables): desc(Xl, Yl) V- child(Xl, Yl) We get as resolvent -i child(X, peter) X1=X, Yl=peter Choose for unification the fact child(jon, peter) (nr 3) We get as resolvent the empty clause (contradiction) X=jon Thusdesc(X, peter) for any X desc(jon, peter) is X=jon is a solution Consider as goal: desc(X, peter) A = a value for X that makes the predicate true A formula is if its is a We derive a contradiction using Write the negated goal: -i desc(X, peter) i e , desc(X, peter) is for any X Choose the first rule for unification (use fresh variables): desc(Xl, Yl) V- child(Xl, Yl) We get as resolvent -i child(X, peter) X1=X, Yl=peter Choose for unification the fact child(jon, peter) (nr 3) We get as resolvent the empty clause (contradiction) X=jon Thusdesc(X, peter) for any X desc(jon, peter) is X=jon is a solution Continue for other Solutions We restart with the negated goal: -idesc(X, peter) We restart with the negated goal: -idesc(X, peter) We unify with rule 2 (renaming variables again): desc(X2, Z2) V child(X2, Y2) V desc(Y2, Z2) We get: -i child(X, Y2) V-idesc(Y2, peter) X2=X, Z2=peter We restart with the negated goal: -idesc(X, peter) We unify with rule 2 (renaming variables again): desc(X2, Z2) V child(X2, Y2) V desc(Y2, Z2) We get: -i child(X, Y2) V-idesc(Y2, peter) X2=X, Z2=peter We unify with child(anna, jon) (nr 3) X=anna, Y2=jon We get as resolvent -i dese(jon, peter) We restart with the negated goal: -idesc(X, peter) We unify with rule 2 (renaming variables again): desc(X2, Z2) V child(X2, Y2) V desc(Y2, Z2) We get: -i child(X, Y2) V-idesc(Y2, peter) X2=X, Z2=peter We unify with child(anna, jon) (nr 3) X=anna, Y2=jon We get as resolvent -i dese(jon, peter) We’ve already seen dese (jon, peter) =^- leads to empty clause X=anna is another solution for initial question We restart with the negated goal: -idesc(X, peter) We unify with rule 2 (renaming variables again): desc(X2, Z2) V child(X2, Y2) V desc(Y2, Z2) We get: -i child(X, Y2) V-idesc(Y2, peter) X2=X, Z2=peter We unify with child(anna, jon) (nr 3) X=anna, Y2=jon We get as resolvent -i dese(jon, peter) We’ve already seen dese (jon, peter) =^- leads to empty clause X=anna is another solution for initial question if goal has variables, Prolog searches for all unifications substitutions With no variables, determines if predicate is true Use constant and binary function c ( ) to model lists Model л-ary with n + 1-ary (between args and result) Model tail-recursive caii using same variable in the result position rev3(nil, R, R) rev3(c(H, T), Ac, R) rev3(T, c(H, Ac), R) rev(L, R) rev3(L, nil, R) With goal rev(c(l, c(2, c(3, n 7)))),X) we get X = c(3, c(2, c(l, nil))) Derivation: rev(c(l, c(2, c(3, n 7))),X) Ll=c(l,c(2,c(3,nil))), R1=X rev3(c(l, c(2, c(3, n 7))), nil, X) rev3(c(2, c(3, nil)), c(l, nil), X) rev3(c(3, nil), c(2, c(l, nil)), X) rev3(nil c(3, c(2, c(l, nil))), X) Hl=l, Tl=c(2,c(3,nil)), Acl=nil H2=2, T2=c(3,nil), Ac2=c(l,nil) H3=3, T3=nil, Ac3=c(2,c(l,nil)) X=c(3,c(2,c(l,nil))) 20 December 2017 Testing has repetitive components, so automation is justified The problem is of automation [Kaner] Time for: test creation, checking their functionality, documentation is automation reusable? (if the program evolves) is maintenance needed? (GUi change, internationalization) Does it delay finding bugs? (fewer resources to run tests) Does it find enough bugs? Or are most found by manual testing is it powerful enough? Or does it automate only "easy" tests? 1) Record user actions ( ) and resulting screen (bitmap) => most primitive level - other checks: with tester effort (interrupt insert) - fragile: susceptible to any product change - possible comparison errors in resulting image 2) script with (select menu button) - more flexible, but does not check graphic layout (low level: font, text size overwrite, etc ) 3) to automatically generate new tests Disadvantage of capture-replay Cannot continue from errors errors are found manually in the recording process => only rerunning a "good" test is automated (regression) Does not define tests implicit for human ("all the rest is OK") (cannot detect unspecified errors, is inflexible - e g bitmap) Automated tools that execute random tests (without a testor’s knowledge on product functionality) : completely ignore purpose (know just mouse keyboard) but may have basic notions about windows menus buttons : have a state mode of the application, explore transitions between these States ++ can sometimes find 10-20% of errors [Nyman, Microsoft, 2000] ++ good preliminary coverage (e g : 65% in 15 min for a text editor) ++ completely automated, no human effort for test capture — "dumb": only bug known to monkeys is system crash — => errors are hard to record and reproduce ++ runs independently, unsupervised, minimal resources (cost) e g any unit testing framework useful in = problem of test oracle : did the test pass ? Nontrivial, often needs manual inspection Risks: - undetected errors (imprecision) - false warnings cost of manual checking Ex: compare continuous signals (in automotive industry) image comparison (for screen printer) Relatively easy: generating test skeletons (declarations + calls) More difficult: intelligent generation of relevant data (coverage) 1) architecture separates data from test structure (like in programs) Example: table, row = test; columns = test parameters A script generates a test case for every table row Minimal reasonable coverage: every pair of parameter values (for every combination of values, number is exponential) 2) architecture A library of functions separates testing from Ui e g open( ile), independent of actions for opening (menu, button click, keyboard, etc ) ++ reuse for frequent actions ++ indirection insulation from testing tool — costly, amortized only in future releases Automatable ( ) for spects in well-defined language Starting from documentation: tabular spec, e g [Pettichord] Test iD Operation Table Name Type Nulls dtbedlOl Add Col TB03 NEW iNT COL CHAR(iOO) Y important: choose format easily understandable by user A translator test interpreter generates the test driver from the table or interfaces with the (commercial) testing tool used ++: requirement-driven {what, not how), independent of implementation and testing tool, self-documented More advanced: automated test generation from specs in formal language e g decision tables in RSML in TCAS-ii aviation protocol test generation from timing diagrams in embedded systems Models: finite automata, UML, Statecharts (hierarchical automata), Message Sequence Charts, timed automata, Petri nets, Markov chains Test generation criteria: satisfactory model coverage all States   transitions; combinations of к consecutive transitions ( ) ++ facilitates generation of relevant tests — investment in model building and maintenance Testing based on by , starting from specifications: 1) question: can the model reach a given state ? 2) if so, a will generate an example trace = test case Goal: exercising program, satisfying a => needs: instrumentation to measure test coverage How: set of : random choice + directed search (to reach branches not yet covered) : executing program using expression with symbolic variables, rather than concrete (numeric) values Symbolic execution gathers for followed branches Satisfiability of conditions is checked with specialized tools (satisfiability checkers, constraint solvers) => generate input data that will exercise that path or prove path is infeasible => stops exploring that path described as early as 1976 (James C King) program is executed by a special interpreter, using inputs => results in symbolic execution tree tree traversai stops when path condition becomes unsatisfiable Test generation purpose: attaining high coverage sometimes, reaching a specific branch Successful mature technique, hundreds of papers, many tools: Java Pathfinder, (j)CUTE, Crest, KLEE, Pex, SAGE, for С СТТ, C#, Java, more recently JavaScript Classic explores each execution path independently -1C1 Л -1C2 -1C1 Л C2 Ci Л -1C3 Ci Л C3 Problem: must express all program language semantics as formula solving arbitrary formulas impossible (limited to simple arithmetic) reality: complex math, library function, environment solution: model libraries & environment e g KLEE tool has models for some 40 syscalls (2 5 kloc) execution is directed by run (hence: "concolic" When symbolic execution is infeasible, perform a concrete execution step e g nonlinear arithmetic, library system functions function explore(pat 7cond = [ci, C2, , cn]) for к = n downto 1 do inputs = solve pathcond = Ci Л Л c j i Л -iq (flip q) rerun with new inputs; capture new pathcond’ explore( pathcond’) Problem: by using concrete values, might not reach desired path у = hash(x);    сап ’t solve hash formula => у is if (x + у > 0)    path 1 else    path 2 Assume: x = 20; у = hash(20) = 13 => To reach , negate x + y > 0, with concrete у (constant 13) Solver might return, e g , x = -15 but we might have hash(15) = 27 (can’t predict) and then x + y > 0 execution still follows path 1 => ; worst-case: degrades to random testing in : need only store tests and expected results (and means to automate comparison) Testing user interfaces (discussed earlier) Testing compilers   translators automated test generation starting from input grammar explores random statistic combination of grammar rules Load stress testing: random; quantity rather than content is relevant Fuzz testing: generate large quantities of random   possibly hostile input, to detect input validation errors or security vulnerabilities e g RANDOOP [Microsoft]: 4M tests in 150 CPU hours 15 person-hr 30 bugs in code tested for 200 person-years, vs 20 errors year manually see also http:  research microsoft com en-us projects Pex  e g American Fuzzy Lop http:  lcamtuf coredump cx afl  maintains queue of test inputs mutates inputs using several strategies if new coverage achieved, add mutant to input queue minimize each test input (keeping coverage) minimize input corpus (avoids overlap) records between program basic blocks classifies runs into crashes hangs normal exit highly successful, found many security vulnerabilities mutating inputs can synthesize interesting formats (e g images) can identify format fields with various meaning (length, checksum, payload, control opcode, etc ) After automating detection => help in Minimizing test inputs binary search, finds (file) input half that caused error Minimize differences between correct and erroneous run also binary search, for two close inputs Fault localization in debugger, compare execution state of correct and buggy run detect (precisely statistically) invariants patterns violated by erroneous run Fault localization compare erroneous runs and find points where variables start affecting output Delta debugging [Zeller]: partial automation of these techniques Marius Minea marius@cs upt ro 13 December 2016 Preprocessing is done prior to compilation: cpp or gcc -E NAME replacement LEN 20 NAMEfargl, ,argn) replacement MAX(a,b) ((a)>(b)?a:b) NAME(argl,arg2, ) replacement can use VA ARGS to refer to extra arguments define a symbol witout value: used in conditional compilation NEEDS MATH H SOME DEFiNED NAME undefine a defined macro Macros are NOT variables The are like find-replace in a text, actual compiler never sees macros, just code after replacement with macros: put args in parantheses in macro body Don’t use with side-effects if arg evaluated twice: MAX(x++,y) in macro replacements: arg produces string literal for tokens represented by arg x ## у produces string concatenation of tokens for x and у STR(s) #s STRSUB(s) STR(s) J0iN(x,y) x ## у SFMT(m) STRSUB(J0iN(7,m,s)) MAX 32 scanf(SFMT(MAX), s); C preprocessor supports conditionals, using expressions only the corresponding branch of the code will be compiled BYTE ORDER ORDER BiG ENDiAN uint!6 t x = b i b search in system directories search current dir first, then system : e g to avoid multiple inclusion MYHEADER H MYHEADER H Complex programs are written by multiple users, in multiple files How to share variables and functions (global identifiers) ? How to ensure function used consistently (right parameters) ? How to declare one’s own identifiers without conflict with others? of identifiers: where is identifier ? scope: from declaration to end of enclosing } scope: if declared outside any block also: scope (iD in function header) scope ( labels: can’t jump out) if redeclared, scope while scope in effect of identifiers: do they refer to the same object ? : same in all (files) making up program default for functions and file scope identifiers; explicit with declaration : same within one translation unit; if declared : each declaration denotes distinct object (for block scope) , for variables declared with block scope lifetime: from block entry to exit; re-initialized every time : lifetime is program execution; initialized once : with malloc : for Thread local objects (since Cil) An identifier can be multiple times, only A declaration with initializer is a definition A file scope declaration with no initializer and no storage class specifier or with is a several tentative definitions for same object must match become definition by end of translation unit functions: define in one file, declare in all others variables: define in one file, declare in all others Can put declarations in a , and include where needed mylibrary h: made for typedefs, function (NOT definitions bodies), macros, of global variables (like errno), etc N0 definitions (would duplicate if header included in many c files) MYLiBRARY H MYLiBRARY H mylibrary c:   for declarations from ,h (function variable definition; struct definition if only pointer in ,h) + all implementation details that should be hidden from user (declaration definition consistency) library compiled to : gcc -c mylibrary c produces mylibrary o (with symbols for function names) main file has and uses functions compile with gcc program, c mylibrary o An abstract datatype is a mathematical model for datastructures defined by the operations applicable to them {functions) and the constraints among them (axioms) without exposing details about the implementation ADTs the interface provides the the implementation is (hidden) ADTs allow changeable and interchangeable implementations client program relies only on interface, is not affected Def: A is empty, or an element followed by a list An ADT list L with elementtype E is usually defined by: nil - O^L empty list constructor can also be constant rather than function isempty : L —> Bool cons : E x L —> L head : L —> E tai! : L —> L is empty ? constructor: new list from element and rest first element with all elements after head and the head(cons(e,  )) = e and tail(cons(e,  )) =   Some languages have lists as data type: a sum type (alternative) between (1) the value for empty list, and (2) a product type of an element and a list (constructor cons) For structure types, encapsulation is enforced if: header file only contains of mytype *mytype t; C file for contains mytype { Exported functions only work with mytype t => not knowing structure, user program cannot access fields For example, the datatype enforces such an encapsulation intlist intlist intlist intlist iNTLiST H iNTLiST H ilst *intlist t; t empty( ); (intlist t ist); (intlist t ist); t tail(intlist t ist); t cons( , intlist t tl); t decons(intlist t ist, * 14 December 2016 Testing has repetitive components, so automation is justified The problem is of automation [Kaner] Time for: test creation, checking their functionality, documentation is automation reusable? (if the program evolves) is maintenance needed? (GUi change, internationalization) Does it delay finding bugs? (fewer resources to run tests) Does it find enough bugs? Or are most found by manual testing is it powerful enough? Or does it automate only "easy" tests? 1) Record user actions ( ) and resulting screen (bitmap) => most primitive level - other checks: with tester effort (interrupt insert) - fragile: susceptible to any product change - possible comparison errors in resulting image 2) script with (select menu button) - more flexible, but does not check graphic layout (low level: font, text size overwrite, etc ) 3) to automatically generate new tests Disadvantage of capture-replay Cannot continue from errors errors are found manually in the recording process => only rerunning a "good" test is automated (regression) Does not define tests implicit for human ("all the rest is OK") (cannot detect unspecified errors, is inflexible - e g bitmap) Automated tools that execute random tests (without a testor’s knowledge on product functionality) : completely ignore purpose (know just mouse keyboard) but may have basic notions about windows menus buttons : have a state mode of the application, explore transitions between these States ++ can sometimes find 10-20% of errors [Nyman, Microsoft, 2000] ++ good preliminary coverage (e g : 65% in 15 min for a text editor) ++ completely automated, no human effort for test capture — "dumb": only bug known to monkeys is system crash — => errors are hard to record and reproduce ++ runs independently, unsupervised, minimal resources (cost) e g any unit testing framework useful in = problem of test oracle : did the test pass ? Nontrivial, often needs manual inspection Risks: - undetected errors (imprecision) - false warnings cost of manual checking Ex: compare continuous signals (in automotive industry) image comparison (for screen printer) Relatively easy: generating test skeletons (declarations + calls) More difficult: intelligent generation of relevant data (coverage) 1) architecture separates data from test structure (like in programs) Example: table, row = test; columns = test parameters A script generates a test case for every table row Minimal reasonable coverage: every pair of parameter values (for every combination of values, number is exponential) 2) architecture A library of functions separates testing from Ui e g open( ile), independent of actions for opening (menu, button click, keyboard, etc ) ++ reuse for frequent actions ++ indirection insulation from testing tool — costly, amortized only in future releases Automatable ( ) for spects in well-defined language Starting from documentation: tabular spec, e g [Pettichord] Test iD Operation Table Name Type Nulls dtbedlOl Add Col TB03 NEW iNT COL CHAR(iOO) Y important: choose format easily understandable by user A translator test interpreter generates the test driver from the table or interfaces with the (commercial) testing tool used ++: requirement-driven {what, not how), independent of implementation and testing tool, self-documented More advanced: automated test generation from specs in formal language e g decision tables in RSML in TCAS-ii aviation protocol test generation from timing diagrams in embedded Systems Models: finite automata, UML, Statecharts (hierarchical automata), Message Sequence Charts, timed automata, Petri nets, Markov chains Test generation criteria: satisfactory model coverage all States   transitions; combinations of к consecutite transitions ( ) ++ facilitates generation of relevant tests — investment in model building and maintenance Testing based on by , starting from specifications: 1) question: can the model reach a given state ? 2) if so, a will generate an example trace = test case Goal: exercising program, satisfying a => needs: instrumentation to measure test coverage How: set of : random choice + directed search (to reach branches not yet covered) : executing program using expression with symbolic variables, rather than concrete (numeric) values Symbolic execution gathers for followed branches Satisfiability of conditions is checked with specialized tools (satisfiability checkers, constraint solvers) => generate input data that will exercise that path or prove path is infeasible => stops exploring that path described as early as 1976 (James C King) program is executed by a special interpreter, using inputs => results in symbolic execution tree tree traversai stops when path condition becomes unsatisfiable Test generation purpose: attaining high coverage sometimes, reaching a specific branch Successful mature technique, hundreds of papers, many tools: Java Pathfinder, (j)CUTE, Crest, KLEE, Pex, SAGE, for С СТТ, C#, Java, more recently JavaScript Classic explores each execution path independently -1C1 Л -1C2 -1C1 Л C2 Ci Л -1C3 Ci Л C3 Problem: must express all program language semantics as formula solving arbitrary formulas impossible (limited to simple arithmetic) reality: complex math, library function, environment solution: model libraries & environment e g KLEE tool has models for some 40 syscalls (2 5 kloc) execution is directed by run (hence: "concolic" When symbolic execution is infeasible, perform a concrete execution step e g nonlinear arithmetic, library system functions function explore(pat 7cond = [ci, C2, , cn]) for к = n downto 1 do inputs = solve pathcond = Ci Л Л c j i Л -iq (flip q) rerun with new inputs; capture new pathcond’ explore( pathcond’) Problem: by using concrete values, might not reach desired path у = hash(x);    сап ’t solve hash formula => у is if (x + у > 0)    path 1 else    path 2 Assume: x = 20; у = hash(20) = 13 => To reach , negate x + y > 0, with concrete у (constant 13) Solver might return, e g , x = -15 but we might have hash(15) = 27 (can’t predict) and then x + y > 0 execution still follows path 1 => ; worst-case: degrades to random testing in : need only store tests and expected results (and means to automate comparison) Testing user interfaces (discussed earlier) Testing compilers   translators automated test generation starting from input grammar explores random statistic combination of grammar rules nonterminals Load stress testing: random; quantity rather than content is relevant Fuzz testing: generate large quantities of random   possibly hostile input, to detect input validation errors or security vulnerabilities e g RANDOOP [Microsoft]: 4M tests in 150 CPU hours 15 person-hr 30 bugs in code tested for 200 person-years, vs 20 errors year found manually see also http:  research microsoft com en-us projects Pex  e g American Fuzzy Lop http:  lcamtuf coredump cx afl  maintains queue of test inputs mutates inputs using several strategies if new coverage achieved, add mutant to input queue minimize each test input (keeping coverage) minimize input corpus (avoids overlap) records between program basic blocks classifies runs into crashes hangs normal exit highly successful, found many security vulnerabilities mutating inputs can eventually synthesize interesting formats (e g images) can identify format fields with various meaning (length, checksum, payload, control opcode, etc ) After automating detection => help in fault localization Minimizing test inputs binary search, finds input half that caused error (e g for file inputs) Minimizarea diferentelor intre o rulare corecta si una eronata tot cautare binara, pentru doua intrari cat mai apropiate Fault localization in space in debugger, compare execution state between correct and buggy run detect (precisely or statistically) invariants patterns violated by erroneous run Fault localization in time compare erroneous runs and find points where variables start affecting output Delta debugging [Zeller]: partial automation of these techniques Marius Minea marius@cs upt ro 18 December 2017 Large programs are written by many users, in can be then ("translation units"), into a single executable Need to: of variables and functions: allow use of functions   variables allow declarations which are (no name conflicts) ensure functions are (with right parameters) This is controlled through and of identifiers of identifiers: where is identifier ? scope: from declaration to end of enclosing } scope: if declared outside any block also: scope (iD in function header) scope ( labels: can’t jump out) if redeclared, scope while scope in effect of identifiers: do they refer to the same object ? : same in all (files) making up program default for functions and file scope identifiers; explicit with declaration : same within one translation unit; if declared : each declaration denotes distinct object (for block scope) , for variables declared with block scope lifetime: from block entry to exit; re-initialized every time : lifetime is program execution; initialized once : with malloc : for Thread local objects (since Cil) An identifier can be multiple times, only A declaration with initializer is a definition A file scope declaration with no initializer and no storage class specifier or with is a several tentative definitions for same object must match become definition by end of translation unit functions: define in one file, declare in all others variables: define in one file, declare in all others Can put declarations in a , and include where needed mylibrary h: , has made for typedefs, function (NOT definitions bodies), macros, of global variables (like errno), etc N0 definitions (would duplicate if header included in many c files) MYLiBRARY H MYLiBRARY H mylibrary c:   for declarations from ,h (function variable definition; struct definition if only pointer in ,h) + all implementation details that should be hidden from user (declaration definition consistency) gcc -c mylibrary c compiles to contains for functions, : mylibrary o for function names main file has gcc program c mylibrary o , uses functions, types, and with library An abstract datatype is a mathematical model for datastructures defined by the operations applicable to them {functions) and the constraints among them (axioms) without exposing details about the implementation ADTs the interface provides the the implementation is (hidden) ADTs allow changeable and interchangeable implementations client program relies only on interface, is not affected C provides the * type to work with files A * can only be used with the functions from stdio h: a value of type * can only be obtained from fopen we can’t dereference a *, not knowing the type the declaration is not accessible, it’s not in stdio h it’s some structure, declared only in the source of the library can’t index, no pointer arithmetic, etc , only standard functions Def: A is empty, or an element followed by a list An ADT list L with elementtype E is usually defined by: nil empty list constructor can also be constant rather than function isempty : L —> Bool cons : E x L —> L head : L —> E tai! : L —> L is empty ? constructor: new list from element and rest first element with all elements after head and the head(cons(e,  )) = e and tail(cons(e,  )) =   Some languages have lists as data type: a sum type (alternative) between (1) the value for empty list, and (2) a product type of an element and a list (constructor cons) For structure types, encapsulation is enforced if: header file only contains of mytype *mytype t; C file for contains mytype { Exported functions only work with mytype t => not knowing structure, user program cannot access fields The datatype also enforces such an encapsulation intlist intlist intlist intlist iNTLiST H iNTLiST H ilst *intlist t; t empty( ); (intlist t ist); (intlist t ist); t tail(intlist t ist); t cons( , intlist t tl); t decons(intlist t ist, * ilst { intlist t nxt; }; intlist t empty( ) { NULL; } (intlist t ist) { ist == NULL; } (intlist t ist) { lst->el; } intlist t tail(intlist t ist) { lst->nxt; } intlist t cons( , intlist t tl) { intlist t p = malloc( ( ilst)); (!p) NULL; p->el = el; p->nxt = tl; p; } intlist t decons(intlist t ist, * ) (elp) *elp = lst->el; intlist t tl = lst->nxt; free (ist); tl; } if header file declares (exposes) only a type to the data, implementation is incomplete structure type: ilst *intlist t or a * (but dangerous: no type safety) Declaration of structure should be hidden in c file not exposed in ,h file (which is included by all clients) ilst { intlist t nxt; if library client has this structure, datatype is no longer can use internai representation, change the structure in-place, etc C does not have polymorphism or parametric types => cannot declare, e g , list of Could do: ; (or even a and have everything else use elemtype But need to everything when changing elemtype binary code differs even for assignment parameter passing due to varying element size; even more so for addition, etc if instead of values we store to values, we can have just one implementation (list of *) must separately allocate memory for elements program logic must know element type (info not in the list) То modify the list in-place, we need access to the representation: ilst { intlist t nxt; }; Two pointers, splitting list: one to part of list already reversed (initially NULL) one to rest of list to be reversed (initially full list) intlist t rev2(intlist t rest, intlist t done) { (isempty(rest)) done; intlist t nxt = rest->nxt; rest->nxt = done; rev2(nxt, rest); intlist t rev(intlist t ist) { rev2(lst, emptyO); } When inserting deleting into a linked list (e g list), must change link in cell to the one inserted deleted keep of pointer to be changed (address of link field) better than with address of previous element (may not exist) intlist t hd = cons(3, cons(4, cons(7, NULL))); (intlist t ist) { (intlist t *adr = &lst; *adr; adr = &(*adr)->nxt) printf( , adr, *adr); adr : 0x4dea8, *adr: 0xda050 adr : 0xda058, *adr: 0xda030 adr : 0xda038, *adr: OxdaOlO in picture, top row denotes of individual fields 0xda050 0x4da058 0xda030 0xda038 OxdaOlO 0xda018 | 3 | 0xda030 | >| 4 | OxdaOlO | >| 7 | NULL | 0x4dea8 hd i 0xda050 i ist i 0xda050 i adr i 0x4dea8 i intlist t rdlist( ) { hd, *adr = &hd; ( ; scanf( , &n) == 1; adr = &(*adr)->nxt) (*adr = malloc( (*hd)))->el = n; *adr = NULL; hd; OxOcbO adr| 0x0cb0 [ hd| ??? | OxOcbO 0x48500x4858 adr 0x4858 hd 0x4850 3 ???? OxOcbO 0x4850 0x4858 0x48700x4878 adr| 0x4878 | hd| 0x4850 | | 3 | 0x487(T|->| 4 | ???? | OxOcbO 0x4850 0x4858 0x4870 0x4878 0x48900x4898 Queue: first-in, first-out (FiFO): insert remove at different ends QUEUE H QUEUE H q *queue t; queue t q new( ); (queue t q); (queue t q); queue t q put(queue t q, ); (queue t q); (queue t q); Use a cell before actual first element; each get deletes it, next cell becomes dummy invariant: empty queue has hd==last e { e *nxt; > } elem t; q hd = q->last = malloc( (elem t)); q->hd->nxt = NULL; q; Marius Minea marius@cs upt ro 19 December 2016 Briefly: Compiler translates to First step: produce : gcc -c file c —> file o has binary (executable) code for all functions contains (names) of functions variables in source and symbols (e g library functions) defined elsewhere Can also produce : gcc -S file c —> file s (human-readable version of executable code) Second step: object files together (and with standard library) gcc filei o file2 o —>a out (links) symbols used in one module and defined in another More linking done by operating system at program start (for dynamic libraries) one memory сору of library can be shared by many programs Use of (standard) library so far: we know a (declaration), e g * ( *fname, *mode); is included from we do code for fopen only the which is part of the library last compile stage program with the library Program is of underlying details (Unix Windows? file system type?) of library function can (new compiler version, bug fix, new file system) as long as (function prototype) stays the same An abstract datatype is a mathematical model for datastructures defined by the operations applicable to them {functions) and the constraints among them (axioms) without exposing details about the implementation ADTs the interface provides the the implementation is (hidden) ADTs allow changeable and interchangeable implementations client program relies only on interface, is not affected is an abstract datatype in the standard C library don’t know implementation detail can only access with given functions (fopen, fgets, fread, etc ) An ADT list L with elementtype E is usually defined by: nil : () —> L empty list constructor can also be constant rather than function isempty : L —> Bool is empty ? cons : E x L —> L constructor: new list from element and rest head : L —> E first element tail : L —> L with all elements after head and the linking these functions head(cons(e, ?)) = e and tail(cons(f,  )) =   can be seen as definition of cons isempty(nil(y) = true, isempty(cons(f, ?)) = false head, tail undefined for list which isempty intlist intlist intlist intlist iNTLiST H iNTLiST H ilst *intlist t; t empty( ); (intlist t ist); (intlist t ist); t tail(intlist t ist); t cons( , intlist t tl); t decons(intlist t ist, * if header file declares (exposes) only a type to the data, implementation is incomplete structure type: ilst *intlist t or a * (but dangerous: no type safety) Declaration of structure should be hidden in c file not exposed in ,h file (which is included by all clients) ilst { intlist t nxt; if library client has this structure, can use internai representation (no longer an ADT) ilst { intlist t nxt; }; intlist t empty( ) { NULL; } (intlist t ist) { ist == NULL; } (intlist t ist) { lst->el; } intlist t tail(intlist t ist) { lst->nxt; } intlist t cons( , intlist t tl) { intlist t p = malloc( ( ilst)); (!p) NULL; p->el = el; p->nxt = tl; p; } intlist t decons(intlist t ist, * ) (elp) *elp = lst->el; intlist t tl = lst->nxt; free (ist); tl; } C does not have polymorphism or parametric types => cannot declare, e g , list of Could do: ; (or even a and have everything else use elemtype But need to everything when changing elemtype binary code differs even for assignment parameter passing due to varying element size; even more so for addition, etc if instead of values we store to values, we can have just one implementation (list of *) must separately allocate memory for elements program logic must know element type (info not in the list) Assume: we know declaration ilst { intlist t nxt; }; Two pointers, splitting list: one to part of list already reversed (initially NULL) one to rest of list to be reversed (initially full list) intlist t rev2(intlist t rest, intlist t done) { (isempty(rest)) done; intlist t nxt = rest->nxt; rest->nxt = done; rev2(nxt, rest); intlist t rev(intlist t ist) { rev2(lst, empty()); } A pointer р allows the of we can use the access to a value: *p: p is an *p found at p P (either read or write) Useful for communicating between program parts: have an address p other functions that have p can change *p by reading *p always have latest value Analogy: URL (address) vs web page contents (value, may be updated) When inserting deleting into a linked list (e g list), must change link in cell to the one inserted deleted keep of pointer to be changed (address of link field) better than with address of previous element (may not exist) intlist t hd = cons(3, cons(4, cons(7, NULL))); (intlist t ist) { (intlist t *adr = &lst; *adr; adr = &(*adr)->nxt) printf( , adr, *adr); adr : 0x4dea8, *adr: 0xda050 adr : 0xda058, *adr: 0xda030 adr : 0xda038, *adr: OxdaOlO in picture, top row denotes of individual fields 0xda050 0x4da058 0xda030 0xda038 OxdaOlO 0xda018 | 3 | 0xda030 | >| 4 | OxdaOlO | >| 7 | NULL | 0x4dea8 hd i 0xda050 i ist i 0xda050 i adr i 0x4dea8 i intlist t rdlist( ) { hd, *adr = &hd; ( ; scanf( , &n) == 1; adr = &(*adr)->nxt) (*adr = malloc( (*hd)))->el = n; *adr = NULL; hd; OxOcbO adr| 0x0cb0 [ hd| ??? | OxOcbO 0x48500x4858 adr 0x4858 hd 0x4850 3 ???? OxOcbO 0x4850 0x4858 0x48700x4878 adr| 0x4878 | hd| 0x4850 | | 3 | 0x487(T|->| 4 | ???? | OxOcbO 0x4850 0x4858 0x4870 0x4878 0x48900x4898 Queue: first-in, first-out (FiFO): insert remove at different ends QUEUE H QUEUE H q *queue t; queue t q new( ); (queue t q); (queue t q); queue t q put(queue t q, ); (queue t q); (queue t q); Use a cell before actual first element; each get deletes it, next cell becomes dummy invariant: empty queue has hd==last e { e *nxt; > } elem t; q hd = q->last = malloc( (elem t)); q->hd->nxt = NULL; q; Marius Minea 9 January 2017 take advantage of different language features code reuse (libraries) efficiency (of libraries) increase acceptance by providing to other languages function caii mechanism (parameter passing) storage layout of objects naming conventions for externai function symbols memory management (garbage collection) exception handling Application Binary interface = machine-level interface between program modules Covers: size and alignment of data types calling convention how system calls are made function name mangling (for overloading, e g C++) cdecl: caller cleans up stack args passed right to left regs eax, ecx, edx are caller-saved, rest: callee-saved result returned in eax typical for Linux GCC stdcall: callee cleans up stack (must know arg count) typical for MS Win32 APi simplest: just declare function as extern "C" ensures function name is not mangled as in C++ (symbol name is just function name) Many Python libraries are written in C, so interfacing is natural Python’s ctypes module can: load C functions on the fly from shared libraries (DLLs) translate simple data types between C and Python import ctypes libc = ctypes CDLL( t = libc time(None) print t ’ lib libc so 6’ ) # caii C function, None = NULL # use result in Python code: ctypes tutorial + Wikipedia types corresponding to C: c int, c char p, etc and corresponding values (None for NULL) access to representation: raw vs value for strings Python bytes objects are immutable create string buffer() to caii C functions which expect mutable memory Platform invocation Services Two options, depending on availability of library source code (and need to marshal function arguments) Plnvoke (C++ interop) usable if parameter types have same representation in managed and unmanaged memory — no conversion required better efficiency and type safety Plnvoke DllimportAttribute placed before function deci can specify type of marshaling needed creates managed entry point with needed thunk (transition code) and simple data conversions One more option: iJW (it Just Works) no DLLimport declarations but explicit marshalling code JNi: Java Native interface historically first JNA: Java Native Access community-developed, simpler, no boilerplate glue code JNR: Java Native Runtime current JEP (Enhancement Proposal), good performance Native function is written with two extra arguments: a JNiEnv pointer for interface to the JVM with lots of functions to interact with the JVM e g convert arrays and strings a jobject reference to the current object (of the class where the native method is declared) Triggering array copies: arrays are passed as opaque handles; should use callbacks into JVM to get set elements Reaching back instead of passing arguments usual style: pass object, access fields here: each object access must reach (crosss) back into JVM Native code must check for exceptions on JNi calls Local references created have lifetime until native code completion Memory leaks: global references created and not garbage collected simplified, no generated headers or wrappers for native code pure Java implementation, based on libffi library (library to interface with various calling conventions, calling any function based on a caii interface description) but: does not support C++ slower (data accesses in Java; copies b w C and Java; cost of calls since type information determined at runtime, not statically) Java code following C data may be layout-dependent and ugly aims to overcome the cumbersome parts and portability issues of JNi, and the performance problems of JNA also based on libffi, with several levels in between wide coverage of native functions (POSiX, etc ) proposed basis for a standard Java FFi Marius Minea marius@cs upt ro 9 January 2016 or is a basic, widely encountered problem 1) find whether we have seen a value before = : store all values seen so far 2) retrieve information associated to some identifier к (called ) = f(k) from key к to (info) implemented as (dictionary, association): (key, value) pair in some languages, maps are primitive data types others have libraries used for from arbitrary keys to values arrays only work when keys are integers (in a given range) idea: find a function h with an integer value in a restricted range (usable as index in an array) Every object (key) x is stored in array at index h(x) (usually, h(x) modulo table size) Objects with different hash value are surely different Different objects may have same hash value (collision) need to be fast (easily computed), mixing all bytes of the object have few collisions (esp for objects with close related values) clearly, collisions cannot be avoided if domain larger than range Examples for strings: (h=len; len—;) h = ((h"7)   (h"27))   *s++; (h=5381; c=*s++; ) h += (h " 5) + c; (h=0; c=*s++; ) h = (h 224 bits currently) if a different object is found at index idx=h(x), continue search using a sequence of indices: sequential: idx++, linear: idx+=i with another hash function: idx+=h2(x) until element found when table fills up, objects must be re-hashed deleted objects must be marked empty) to stop useless search entry in hash table is (linked) list of objects with same hash value hashing followed by linear search in (hopefully short) list need dynamic allocation for list elements hash table size comparable to element count (avoid long lists) A constant-time lookup amortized constant-time insert Each key may be found in one of two locations (use two different hash functions) On collision, displace existing key to 2nd location; if that location is full, successively displaced if a cycle is reached, rebuild (larger) table Works well up to   50% fiii factor Arrows in figure show alternate location for a key c в H p w image: http:  en wikipedia org wiki File:Cuckoo svg time h contains structures and functions to measure time clock t and time t are real types representing times tm holds a broken-down calendar time (sec, min, year) timespec holds time in seconds and nanoseconds clock t clock( ); returns (approximation) of processor time used divide by CLOCKS PER SEC (usually iO6) to get time in seconds ( timespec *ts, ); gives time in s and ns since a reference point base (use TiME UTC) timespec { time t tv sec; tv nsec; Place the code to be benchmarked in a loop running many times total time: order of seconds (account for limited clock precision) Ensure compiler doesn’t optimize away repetition (check assembly) e g computing assigning the same value many times may need to use specifier for variables (forces writing reading to memory every time, like in source) Repeat measurements and make an average Time may be affected by other running processes, caching, etc Only natural phenomena can be truly random Computer uses algorithm to generate numbers => period of number generator should be high all bits should appear to be random Quality of stdlib random number generator may not be high (esp for lower bits) Need to use special RNG in cryptography applications ( ); returns an integer in range 0 to RAND MAX (at least 215 — 1) Re-running program will produce the same sequence of numbers! need to initialize state of RNG with a seed ( ); could use calendar time (seconds) as seed - different in each run e g srand(( ) (NULL)); Marius Minea marius@cs upt ro 9 January 2017 time h contains structures and functions to measure time clock t and time t are real types representing times tm holds a broken-down calendar time (sec, min, year) timespec holds time in seconds and nanoseconds clock t clock( ); returns (approximation) of processor time used divide by CLOCKS PER SEC (usually iO6) to get time in seconds ( timespec *ts, ); gives time in s and ns since a reference point base (use TiME UTC) timespec { time t tv sec; tv nsec; Place the code to be benchmarked in a loop running many times total time: order of seconds (account for limited clock precision) Ensure compiler doesn’t optimize away repetition (check assembly) e g computing assigning the same value many times may need to use specifier for variables (forces writing reading to memory every time, like in source) Repeat measurements and make an average Time may be affected by other running processes, caching, etc Only natural phenomena can be truly random Computer uses algorithm to generate numbers => period of number generator should be high all bits should appear to be random Quality of stdlib random number generator may not be high (esp for lower bits) Need to use special RNG in cryptography applications ( ); returns an integer in range 0 to RAND MAX (at least 215 — 1) Re-running program will produce the same sequence of numbers! need to initialize state of RNG with a seed ( ); could use calendar time (seconds) as seed - different in each run e g srand(( ) (NULL)); Error handling is absolutely needed for any environment interaction Also needed when proper result can’t be returned non-numeric string to number; 5th element of 3-element list Error situations can happen anywhere in the "normal" control flow end-of-file, read error, insufficient memory or user-level errors (input does not match format) handling complicates code, obscures the main functionality Functions must be designed to return error conditions complicates their interface User code has to check for errors and propagate recovery up from from deep within processing Exceptions are a control flow mechanism different from function call return, breaking from loops can transfer control across functions Exceptions are and (handled) can be raised by a library function, or by the user imagine a statement that says: setup in protected-code with handler-code When this is executed, the runtime system sets up things so that if the named exception appears (is   ) when executing protected-code, control is transferred to the handling code if nothing happens, execution proceeds with the next statement Syntax varies: Java: try protected-code catch ( exception ) handler-code ML: try protected-code with exception -> handler-code jmp buf myexc; (setjmp(myexc)) { longjmp(myexc, nonzero); Can handle in a , to distinguish values from longjmp: (setjmp(myexc)) { 0: ; vall: ; ; val2: ; ; 11 January 2017 = test that system has desired confidentiality integrity authentication authorization availability non-repudiation a potential event with undesired consequences if it materializes into an : a weakness in the system (design or implementation) : action of an intruder which exploits a vulnerability, threatening an asset as effect : target of an attack 1 does not do something required by specification 2 does something forbidden by specification 3 does something not mentioned by specification 4 does something not mentioned in specification, but which should be H Thompson, Why Security Testing is Hard, iEEE S&P, 2003 Localization of security vulnerabilities (NiST, cf Agarwal McAfee) 41% application code (server-side) 36% application code (non-server) 15% operating system 92% of vulnerabilities are in application not network Requirements and use cases Design Test plans Code Test rasul ts Field feedback G McGraw, Software Security, iEEE S&P, 2004 http:  web securityinnovation com Portals 49125 docs  19AttacksforBreakingApplications pdf Automate finite si expresii regulate 29 februarie 2016 Probleme cu automate finite si expresii regulate Exercitiu Construiti un automat determinist care accepta toate sirurile de a si b in care daca apare bb, pana atunci nu a aparut aa Rationam astfel: orice sir care nu contine aa e bun, clar claca apare aa, mai tarziu nu poate sa mai apara bb, altfel nu respecta conditia ceruta Deci construim intai un automat care recunoaste sirurile in care nu apare aa: orice sir care se termina inainte de a fi aparut aa e acceptat b Dupa orice a, in sir trebuie sa apara b pentru a putea continua Sau, altfel exprimat, numaram cati a consecutivi am avut imediat inainte de starea curenta: 0 in starea so, unul in starea m Deci cu orice b se revine in sq Din s-2 insa (doi a consecutivi), automatul nu mai accepta nimic Completam acum acest automat ca odata ce a vazut aa (clin S2), sa nu mai permita bb Rolurile se inverseaza acum, dupa orice b, in sir trebuie sa urmeze un a ca sa respecte in continuare conditiile: Toate starile clin automat sunt acceptoare: oriunde s-ar opri sirul, el satisface conditiile clin enunt Nu exista tranzitie pe b clin 83: implicit, un astfel de sir nu e acceptat Pentru a respecta riguros conditia ca orice stare sa aiba tranzitie pe orice simbol, putem introduce explicit starea neacceptoare 8 ] in care odata ajuns, automatul ramane indiferent de intrare O alta solutie e sa definim intai un automat pentru sirurile care nu respecta conditia clin enunt: putem construi simplu un automat nedeterminist care accepta claca intai vede aa, si mai tarziu, bb: La inceput, intre aa si bb, precum si la sfarsit, sirul poate contine orice combinatii de a si b Apoi cleterminizam acest automat (care accepta toate sirurile nedorite) si obtinem complementul lui (sirurile dorite), schimband fiecare stare clin acceptoare in neacceptoare si reciproc Verificati ca obtineti un automat determinist echivalent celui construit direct mai sus Erori frecvente Un automat recunoaste (accepta) siruri de un anumit fel Daca nu marcam stari ca fiind acceptoare, nu va accepta nimic Nu exista stare acceptoare implicita (de exemplu "ultima" - nu exista "ultima" intr-un automat cu cicluri) Nici nu trebuie sa fie unica - definitia are o multime de stari acceptoare Un automat consuma pe fiecare tranzitie un singur simbol Deci o tranzitie nu poate fi etichetata cu ab, sunt necesare doua tranzitii, cu o stare intermediara Un automat cu o stare care pentru acelasi simbol are doua tranzitii nu e determinist, deci nu e bun claca se cere un automat determinist (Daca a fost insa gandit bine, poate fi cleterminizat) Un automat cu o tranzitie in bucla revenind in aceeasi stare pentru toate simbolurile alfabetului (ca so in ultima figura) poate consuma orice sir ramanand in acea stare intr-un automat determinist, avand deja tranzitii pe toate simbolurile, ea nu poate avea nici tranzitii spre alte stari Deci nu are sens decat eventual ca ultima stare acceptoare sau de eroare Discutie E util sa distingem cateva clase tipice de siruri "bune" (acceptate) Putem avea cazurile: - odata ce un sir a devenit "bun", el ramane acceptat, de exemplu, sirurile care contin un anumit tipar: odata tiparul gasit, poate urma orice - odata ce un sir nu e bun, nu mai poate fi acceptat, cum ar fi sirurile care nu contin un anumit tipar: odata ce tiparul apare, poate urma orice, clar sirul nu mai poate fi acceptat - putem avea o alternanta de stari acceptoare si neacceptoare, cum ar fi siruri cu un numar par de 1: fiecare 1 citit trece automatul clintr-o stare acceptoare intr-una neacceptoare si invers Odata ce am scris un automat, e util sa il parcurgem pentru a vedea cateva siruri acceptate, cu mare atentie la cicluri: combinand cicluri putem obtine situatii care poate ne-au scapat initial Logica si structuri discrete Note de curs 1 Marius Minea Automate finite si expresii regulate 29 februarie 2016 Exercitiu Scrieti o expresie regulata pentru sirurile de a si b in care orice ab e urmat imediat de a Notam cu | alternativa Precedenta cea mai mare o are *, urmata de concatenare si alternativa Putem gandi solutia in mai multe feluri, dar oricum trebuie sa exprimam cum arata sirurile odata ce a aparut ab Orice ab trebuie urmat de a in particular, ar putea fi urmat de inca un ab (urmat la randul sau de a), dar putem avea si oricati a intre doi ab consecutivi Ajungem la expresia regulata (ab|a)* Daca se termina cu ab, ea trebuie urmata de a; e valabil insa si sirul vid: e|(ab|a)*a inainte de portiunea unde se poate repeta ab, putem avea oricati b urmati de oricati a; dupa aceasta portiune (deci cand nu mai apare b) putem incheia cu oricati a Repetitia de a e insa deja inclusa in expresia gasita E suficient sa completam deci: b*(e|(ab|a)*a) Putem gandi si asa: odata ce a aparut primul a, nu mai putem avea bb, ci doar ba Ajungem la (a|ba)* Din nou, aceasta include oricati a dorim, la inceput si sfarsit E suficient deci sa adaugam oricati b initial: b*(a ba,y* O alta varianta, poate mai laborioasa dar mai sigura e sa construim intai un automat ba initial, avem un sir de b care nu introduce constrangeri Apoi, orice b din si inseamna ca a fost precedat de un a, deci trebuie urmat de un a, revenind in aceeasi stare ’ b a Pentru a obtine expresia regulata, vedem usor ca putem elimina starea intermediara neacceptoare s2 Obtinem: Cum ambele stari ramase sunt acceptoare, limbajul devine b*(e|a(a|ba)*) Remarcand ca a din fata parantezei interioare poate fi generat si din repetitie, iar apoi e la fel, ajungem la aceeasi forma simplificata, b*(a|ba)* Putem vedea mai bine simplificarea remarcand ca inlocuind a cu e pe tranzfiia s0 s1 obtinem acelasi efect, si e e element neutru (nu conteaza) la concatenare ba b a ba Erori frecvente Expresia regulata (a|b)* poate genera orice sir de a si b Ca si la automate, daca vrem sa generam siruri care nu contin un tipar, nu putem avea (a|b)* ca subexpresie, pentru ca poate genera orice, inclusiv tiparul nedorit Desi expresiile regulate sunt echivalente cu automatele, daca problema cere o expresie regulata, aceasta e ceea ce trebuie sa dam ca solutie Discutie Cele doua reprezentari din dreapta nu sunt automate (nici deterministe nici nedeterministe) dupa definitia data, pentru ca acestea nu pot avea siruri arbitrare pe tranzitii, ci doar cate singur simbol Pot fi insa reprezentari informale utile pentru transformarile prin care obtinem o expresie regulata Logica si structuri discrete Note de curs 2 Marius Minea A Probabilistic Property-Specific Approach to information Flow Daniele Beauquier1’*, Marie Duflot1’*, and Marius Minea2’* 1 University Paris 12, France duflol}@univ-pariil2 fr 2 institute e-Austria Timisoara, Romania mariusOcs utt vv Abstract We study probabilistic information flow from a property-specific viewpoint For a given property <>( iiiter^st, spevified as s r;iry ol basic security onedaoiles from which common notions of secuoita propertiea canOeconsfructcd in the same view,thrt an aoalysis af iihoiiicOion aWwmutf be flexilCr enocgh to be adapted to thrspecific feaducaf aod needs of tho i-nenbeisi apylicafron, we propose a paramoteetobd o^ewot mformbticc flaw hhat deeaiaps a quontiSo-tive, probabilistic еррео-аП tfetched tor OVo iSC'Ci ii flowwith respect to a propertf (a eet nf cystcm traces, pcssiPio obstractcdin its iow-level part) which is deemed iepc-tcni forthe eyttrcn -indes ccifiilny The tshl oiven -nan>p d iraеа^п^-^О^ Ю^^^пк^і-^с^^ high-level information flow, ір which hrooert-espaepeteof soywencoe ol' nighdevel events, and sequential infocmation flcw, io wOicSi prafe-diat can deccribenat oe-y sequences of high-lnyciepewtabutc-yo how hc't Р'С by the low-level, following tdo viowat -2] in examining inCatmeciap (fi ac, we oansider Swo oiswe on -Pesoqeenee ol events in a trace in the PrsLag oc-  wiaw, js^tqac'tie^a^^ simy-y satsad Srtces (infinite sequences ot evppSs^Aitcswattpelyjin t rsicb-a-zed e-ew, the pi-crciii timepoint splits a trace intoa —wi-ci o finile сефіероеоР pdfteysybt ynd an infinite sequence of-yf-iae c-cenis iii thisat-w we tfin cxptecr properties tbac link the past behapiorwiththafuturebcyay-or of tho sceionn we Оуьу уУп-ycpof information flow if fiieli a beltapiot soi isaquiyroycbia regsrdless of the low-lepel obserpation up to the currenc Simepoind iwr htopatfymah chite that if the last epent beiote the tim-paint if a thao che next eycnt is, and Ю the last epent before thetimepyintisS fhao tOe oext epent caoyof bc a' We then gipe chareetericatientoc soscsms liail aee^ecm'er^ceoayn^jiitpthi^se piews of informationflcw desccibing (hastcucinre о- licit rracc sete in tccmned high low-lepel epenteinOtecir viob;iblli| 1'— Using this iraiiioiaarS anC fiPcosm- орргорпоГо sita ol'[>-ientscvl we cnc express seperal clastical-O'tiaWoes af possibi-istic securiSwi gepsrtliocW yomedar-ference , ііоіііііііроос' [fh] and s^iaasab^^^fry [13 ,Pii the sametime, Cy suy-porting a user-defincd choice si' psohетttes,we allow a liner -руіісііп-ііо tor cc' definition of information flowthan psp c-cs-i ilppco;>chotl inaydifif^-octystsmt 208 D Beauquier, M Duflot, and M Minea that are not secure according to one of these notions, the probabilistic approach allows us to give a quantitative measure of the appearing information flow An important issue when defining security properties is deciding what kinds of information flow are acceptable in some existing definitions of information flow, such as noninference or the perfect security property , covert chan-nels already existent in the lS)iiaWi system are allowed, such as auditing or copying low-level events on a high-level Such definitions take a causal view, defining information flow as the factthathigh-level beha viorinHueneos laiv-kvnS behavior Conversely,thfs том^ that viasena a gtrihf of low-leael tvootsmay allow us to deduce somethirg afiosl tfiehihli-levelevenes ttiof have oigiim'dio the past, prior to thpseobrrfvnSfons in contrast, we nahe acerafyhStvwtftlscaal viewnThvt, ifo faw-levelohvtfn vation is compatiblsoaljr with an interlesvmg olhigVilevei waenta,but noi with another, this constiSutes informitieti i'loitr-egaisllese whethar th-c keaw-edhufs already present in thedeseripSiou aira-n st) oevhp sysnem endeed, thevftba-bility of a given inl or ivatli sp of ^ОАеу,! eveiitrdepenVfin thissituailonan ths low-level observatio^wh-ch soiTospaiidsia ѵіі-Ьоі'іііОіоііоі'ііііоіііііі ionfiow Related Work Work on tailoring шсие^о propcsiiee tothrthstemunnst соозОАегоиог агку inates with the stringa^t diflereut |Ш1еИ|ои^^1ги1 llUomollioli^'low [V, 1 f, 14,it] Following the recognitlon tfat sevuslty ls aprapertyolttagt sets ratavethan traces (e g , ), in [t8g tecurilyprohsrtiev ave holinedunifosm-y Аув^сИ^^ a predicate that the low-level equivalent bunch of a trace has to satisfy The approach is taken further in by defining basic security predicates in terms of a restriction and a closure requirement on a trace set The parameterization in the latter paper is given by ths 'eorianSs iiic-ilet tina liaeli-aperiVians otmtr^s^^ng and deleting high-level avinte nea [oase to itf'i'pllKttaberiK'f' aiul psnu'iuts respectively, confidentlai) sen Vt pi'ihornirb Probabilistic information flowhassaturalia bnen most illlliii fii o trsat l isiii the possibilistic veosiom  0Woevi inSroducesthe ftosi mtStl which dittioa guishes mere correlvflon Vraiii actup-caunal influente Gsau intraducee ptolo abilistic interferencnin a (DsS'al of finiSa stafomachines a isl gloata mare vontoni information-theoretic iraiiowart intisding ]o isSo-il>tislns, ireal general, system-independenl tal Sri it in^c^nnatfoo flow -^Vtsmtp^s^rO whicC |>;v tiiitlilb A Probabilistic Property-Specific Approach to information Flow 209 izes information flow is defined in by giving a definition of secrecy in multi-agent systems, using a modal logic of knowledge in a state-based model This generalizes several existing approaches and can be extended to probabilistic security Their parameterization stems from defining formulas (knowledge) of what must be kept secret, thus providing a fine-grained way of characterizing security requirements Since the approach isstate-based,ourmodelappears complemen-tary in that it can talk about both past and future evolution of the system Other perspectives on information flowinclude that of whichoffers ti variety of characteoiwaCions oinon-lnCwrferenc^exprcssad ns llotre fagics ond CTL; however, the vaeieiy is nit given byp;ii;-niiiSoriziOion boC languagcvd-pects such as sequcntiei vs eoncnrinin gsiermination ssnsitlviCy CSoserto a parametric view is Pio apprvaehof g4], wlu'pg tht ]twramcier isat oboesnable property (an abstractinn)cf tho piiClie observaisons oi o p-vgeam Thus, tht at-tacker is a data-flow goatyzesl andecn Pe spesi-ied Sn aa obsictct ii-iV[iitSiniisi framework Both approaeheg deglwibiimuch mats specific systemt, ntscriheb Sa particular programming lonaucseg, ind VPc slass nf expicnegdpeapertiusOhough parameterized to some exSsot,tenob as рі'Ппуі|і Beyond the розпіЬііізі-с iipi>rivicie'ti oncSyccs quanthaiivehieimciiion flow for a simple imaeiePee lantuaoe Srnm a semantic point ol hvgw, hVeseg-p l^i replace indistinguisagbiliin ia tlie lomiclisdion nl nondiil oslcivgto by sintihisg^sr based on the notiontl gislanteAns p-acess-aigebtaic setiingi in compariso^ we also define quaatitp ol' SnroemaSionflnw hvsedon i is Pistsnce belarecntlio probability of a propert) givea sw oPservaWіоп Our approach topa-cmeteviet,tisn sPlows properijcs tPot ninge from the general to the entirely system-^t^^ihv 'a'huswe ean 0^^ thegranuiatito(a particular trace set or even a smgle Psaee) wlthrtspecC П-r^liB^Sc ipfovmoCiob flow is analyzed Alternativea^ quantifying ove- classet nSensg p-opoiShw wc can -till obtain and reason eiioel sik-sV cS -he cSastlo PtCmtlOTio ag hiiooninliog lloov Paper Outline We firsd ietaoduae the matbematical inoP of -probabilistic event systems which we use throughout the paper Section S gives property-based definitions for three classes nl probabilistic information flsw, and theorems shar-acterizing systems that conform to these notions These results are ex-endsS ln Section 4 to properties which distinguish betweenpast and future wlth respect to the reference poind defined by the odservation Seatisn 5 showt how some af the classic definitions of information flow can be expressed in this formalism 2 Probabilistic Еѵеп ysinms Notations Given a finite alphi^de ll, tvg ktA* Cresp A6 donste thest- offinite (scsp infinite) sequences -от trocos) dwrHss h^lnbst The tetATO p tlre urnonoM* and Аш The empty sequence is denoted e Given a sub-alphabet A' C A and a trace А, Ац  denotes the projectionof A onto this sub-alphabet if A is a finite поп-empty trace, last(X) denotes the last letter of A 210 D Beauquier, M Duflot, and M Minea Let A be a (finite or infinite) trace We denote by Pref(X) the set of finite prefixes of A More generally, if Tr is a set of traces, Pref(Tr) = iJagiv Fref(A) Let u,v G (А*)", и = (xi,X2, ,xn),v = (yi,y2, ,yn)- We denote by и   v the simple interleaving of и and v defined as и   v = хіуріху'^ xnyn if U, V C (A*)", we denote by U 0 V the set: U  V = {u  v u tU,ve V} if U, V C (A*)w,the definition of U  V is extended in astandard way The interleaving of two sequences x,y, denoted by interl(x,y) is the set of sequences: | o = хіх2 хп,пе^, у = уру2 уп, х^уі е А*} This extends to setsofsequences: шУег7(ПХ,У) = {intег=х,у) х:П=}Хру PE Y} Probabilistic Event System The execution of a system is modeled by ils set Tr ol ttaces whixh are finite or infinite sequences of atomic events from a setE A partieular atomfrevent p is distinguished which reprexent^sll s hefrixg of thxsyxtem For example, if A is n sequence of atomic ei^en   i is useful to distinguish =etween AA has жcuned but the system still executes",xed has xccuernxlexd the system has stoppeT’ The latter case is modeled by the event Ат To unify the presentation, it is convenient to use only infinite sequences, writing Атш instead of Ar Then, from now on, Tr is a set ol'iiiHiiiln eepuensee eiiiHi do notcontam enyoccurrencexi t except when theyare ol iheloimAr wheee A xeid riemis occurrencs nt e The set of atomiceventeiSls disedsd mtoiwodissxisS tets, rhesxt H xfOigh-level atomic events xellle' set iof low-leiel oxes i ) eaz>x-) probabilily (Wx assumx that ocory ртхйх -P a iraeeix TShes a nan-exуnpsoaabilltyil Traditionally, a^e^^O Ss a mxasxsablx sot in thx thxory of probabilitixs, so to avoid confusion, the atomii ehexssxf thx syxbemwlll To salled etiione Wx usx thx custoniaty noteXioiiS >r aoisHlaniti iiixtCUdiirs: if P neS Qaxc two mxasurablx xvxrSx aeX Fr)Q) X 0, hhs cexdiiional [mPiXilHyi^riiiSlj is Fr(FnQ) Fr(Q) Sincx wx arnintefxslePocln islrecxt у) iiisystomi? we dcnl only with conditiooal у-obabiliie's reheSiv^Sx lhr X linx^, foxeach m 01 s r s r s >> | and H (resp L) is the yet of hish-level (resp low-ievelf actions, Tr is the set o traces of the system, and pis a probabilistic m ea sere on Tr We assume thapnnly irw-lsner actioeeaax pbxetуxb)e on |p>' ^w-teeie |^ Xs for a trace A the ргхр>аЬапА|е bobeervalilo ity 1 iS high- anU inw-lenel events Another oxainplo to oativai огі и Let Lq = L   {t’ We write E = (H OLo U ( Th Lq)*-^ for tbn ort olf^lm^t^ilo wntOs foomed by actions from H coS U This is aoopersaS ol tho setol rystem Soaaec: Ou aiO in Tx starting fram n isWelleW Ьурі^ііЬрпі6 lalele) ie ^^n^atlu tbeprababilito that the sequence оі'ііІ "1ііГ'ѵ tac;bililoUii> fram x the sequence od not ions a faliawed by a low-i A Probabilistic Property-Specific Approach to information Flow 213 A node has the color of the edge ending in this node The root is red Two red nodes x and x' of T are H -equivalent if there exists an integer n such that the labels of the paths from the root to x and x' are respectively aiaia2a2- anan and aibia2b2- anbn where og G H* and ai,bi G L We also need to state an equivalence property on L Two nodes x and x' of T are L-equivalent if there exists an integern such that the labels of the paths from the root to x and x' are respectively 01010202    an^ian^ianan and  Зіяі ?2Я2 - Зп-іЯп-і ?пАп where o,,- G H* anda- G L A tuple (x, x y,y')of red nodes of the treeT is g-, L-aempetible if x-nd x' are JF-equivalent, the ipehtity of low-level events is abstrss-Pal out, and only their position in the sequence of events is preserved Theorem 1 A probabilistic so^it-e t su-h ifat Tr 0 Trn = Hn(Tr)  Ъп(Тг} (2) Every H, L-compatible tuple ,y thegpeeT is^ rfect (3) For every pair of Н-е-им-е!-!- nodes =,x' =0T, theprotabilistiat-eef УП atoh Tx> are isomorphic (4) For every n > 0 (Еп^Те) 0 Prs(Tr П (H* = 0) The intuition behind lSis гооа if the system list no i-formatiot bow, theo weptxve (t^i, t4) ond, b, оьп-tradiction, the existesro of tht ramo edgstm ахрІО-  in yti For dtt lgtt-r 0 and Pr(P|B(w)) = 0 The probabilistic parts of (2) and (3) are proven by contradiction as well, assuming that there exist nodes with different ratios, considering the pair of nodes with the highest ratio and obtaining information flow for some property The converse is proven by considering basic cylinders for which it is possible to show that there isno iiilorniadoii flow Then we define measurable subsets Pn which are disjoint unions of cylinders and we prove that there is no information flow for these sets Taking the limit of these sets we show for l ic abs 0 and Prs(P | B(wa)) = O itrereГоге Bham ernaral mformrtionrow T To our kiio vl 0, and in tht case Prs(P, тп) = Prs(P, rm) fos all potitivs iniegers m, n for every relvtivized general property P We conclude tht the evstem S has no relatlviaeV general information flow Conversely, suppose lleil l lie ppp-eetiun ue TpertiL ceptaips a tsace 'p  rbi Then the first actioa a of exis diffeeentfrom T,othrrwise гх touH be e ea to тш Consider thepaopertyP = 0(Vlal72) -e* xE7 |7i|L = t} Wehave Prs(P,a) > 0 and P},s(P,e) =0 Sherelorr S" has r aelaeiaieedgeneralmfor-mation flow   The next theorem Haiisieinrieee the shsleclt til Ь ruve thatO= = U O(M c {e}) Suppncc that there exists 0J02 a "(Uo) cnO samo taG P tocit haCoiao2pd"p Thpfl there is informationflow for the propbrtoF = {cO}x fH U {({fi hrOfP,c) 0 and there exists b G Msuchfhcf PogflP, b)fp 0 Poaofng tir opfler eonOiiioraaf (2) is straightforwaodi folfowing ste s of tLefrootof Theorem 1   The absence of re ntivioedfequeiotiolinformptfoe flnwis s a=r v soran= pyop-erty, and as seen from tio' ooii lil iono in Theorem^, aery fev profo-ibiNs m ipeni systems have this [rno ieriy, Tfoo stems fram the kiti =ta{, inoxjatesving tha property P, a tractic spht into ivo paris, jua{after the occorrence od e lew-level event if it is jteotfljlo te obrorccmore ( har foadevefoiv=iyns in a tracy than specified in tho iroporlyl fo'fe ir informetfooflcw But it is still intereotmgto caaeicfop low- evel toacso of the sama >ап°Поа-, and examine if they givoton^s onditianaliugh-lepelentoematiea ЬеоИае the faot that n low-level events hcne occoored) Weeoe thenmtefeefeO in a = > for ever= a,o >t om tuch і"м" |м| = |г>| and B^ufBifl) arenon-empt" in order to charactcnipe tLr cvol foe;•iЬilifiii  tree T of the system, "Пс reaa=LC оі a noda is Ohe imin-oi oOetOedgea on Ote path from the root to it Theorem 5 A syolem S suchihat To fC m^ io wiihvutsequontiMirelatioined information flow at eaV fixot step rff A Probabilistic Property-Specific Approach to information Flow 217 (1) Va > 0 Trn = Hn(Tr) 0 Ln(Tr) (2) 'in > 0, all nodes of red depth n with outgoing red edges are equivalent (3) For every H-equivalent nodes x,x' ofT(S'), the probabilistic trees Tx and Tx> are isomorphic The proof of this theorem is Pased on the imuna given below which links sequential relativized information flow at each fixed step with sequential relativized information flow Then we can reuse the proof of Theorem 1 Lemma 1 Let R 2 be a sequential property on rares whereR (3 fH U Г)* and = i)"    l)* Then, for PR = Ши, 72) i І7i|(i = arflastlhnh = for еѵегг  и of lengpr n we have Prs(PR,u) = Prs{R ' B(u')') 5 Comparison with Some Classical Security Properties in this section we restrict ourselves to finite systems, for which Tr С (ЯUL)*тш, and we suppose that teL Denrte by E the set H=Ly, wyere fy = U  {r} We identify an element of Tr with its shortest prefix ending with the action t Given a trace A and ti syctem S, thr loudeveluerr obaorѵт{А| от car congtruct the set of system traces which correspond to the same observation, the low-level equivalent set of A: For A e E*0{t}, LLES(X,S) = { 3 e Tr | A,Lo = flLo} We will show that separabiiiiy, oooiete^rfirenoeiona noeinforenra pan br ox-pressed in our frameworr andcorretpoed ee the absence of infoomatt on flow for some classes of propertier ep e s e e pi p l e ep b Noninference is a securita property which was introduced by O’Halloran it requires that every trone A of the tystem admits in iha lew-levml equivalent set its projection A|Lo As e consepuenbp a low-1 evel user canaot deoluce from ah observation the existence oi any occurrence of a high-level action: Noninference (S) = ІХ G Tr 3n € LLES(X,S)uE LqT Consider the property Nonlnf = it,t С (Я U Lo)*r A trace satisfies this property iff it doesnotcontainhigh-level actions Thus this property exactly focuses on the (non) exieaenrr rf a highiievel activiCy tr Huons oul tiet aeaia-ference can be exprestrdin tetms of infotmatimn Sew lor tOa praaerty ДіурІоЦ Theorem 6 For a probabiltotic system S, Noninference fS) holds iff Prs(Noninf ) 0 and there is no (artitatwe grnerai information flow for the property Nonlnf Proof Suppose РгЯАоа а'У 10 p sitl Cheie' is rro quahtotivo р 0, but Prs(Noninf ,v) > Prs(Noninf ,w), a contradiction No qualitative g Prs(mt>eptlt tin— 0 and there is no qualitative sequential information flow for these propertios Proof Suppose TStpa'oaenStitS") liohls CoyeiSer bTeqropenby b'^pF,s ,bt hor some  i, , n G TpH Suppoue inetSepw o ^ni = 0- Led ѵоеа-іО‘2 ар be the projection on L of sometrase in Tl ifp > n tLen ttiet=eat Oncnanc-i yp a Tr, and if p n thes ^nanan+i ep ehr whiet cosbradicbs Pro(Sepyla y"iB(n)) = 0 it p 0;ii|e 2 n" hr)nrn-empty since а- а2 ап пт ntA^ Thrrefoin PosCSepyin, =o) muct Pe aqual to zero since there is act infotmatioo flow lor t=is property   3 Noninterference Noninterference io a setuaito properLy introducat l>y Gy=orai ynd Meseguer and generalized by MtCtllough it,demandt that a lop-level usor cannot infer that any sequehce of high-level inputs has (nwt) ocnurrrd yet Hd С H resw-HO} is the set of high-1 evel input (resp oinpnl j actionc Wy liave Hi Ti HO = 0 A Probabilistic Property-Specific Approach to information Flow 219 VA & Tr V7 & interl(HГ, A|Lo)35 & LLES(X, 8^ = 6 LoUHi For each p>i, , p,n G Hi* Noninter^ ^ = interi(JTO*, а,пГ) x (H U  )ш in a similar way to Theorem 7, one can prove Theorem 8 For a given probabilistic system S, Noninterference(S') holds iff for each n, for each ці, ,pn A Hi* Prsinomnter^, ,^, u)=i Q for every и G Ln such that B(u) is non-empty 6 Conclusion We have studied probabilistic information flow from a point of view parame-terized by user-specified properties of interssh A property is a set of system traces, possibly vi lol lo fhc norficuiariSies uf iOe -y^nn under analysis in e niorodisisioc inSo ІііцІи au! inw-lovei cncnts and a single definitife olitOoutaal ies Soivpolicy may not l>c l'e mi>if>u^^>' st| ou the property An issue for futura reainsele is co ap|>ly this framework in the case where systems and propertiet crs cxplicil! ! gio>ai as Mrrtos chainr aoti fegutar lnv guages, respectively, eod te investiprltethe dcddaiulifn ofUe diiyc ііоііоиі ol' information flow in thlsselOlnhi Acknowledgements We are goateful to Anatol Slissenko for tfe nnmerous and i References 1 Aldini, A , Bravetfi, M , Gnreieri, R i Ap-oasff-eloebraic approach for the analysis of probabilistic noninterference Journal of Computer Security, 12 (2004) 191-246 2 Barthe, G , D’Argenio, d R ,ReeO,P : Secore іulcltralSiUІilW l >e ll l('- llic>filrrr 17th iEEE Coiiipiiivi Sen-riP- FramUstions Wоrnfhop ІЕЕЕ —i si S(t-i ba| >i|in|n mterferenc^Phot ІІМсіі Synip on Gr-iii'ily ond l’i'i-vacy (May 1990) 170-129 8 Halpern, J Y , O’NeiE,K R : Secrecy in multiagent systems Proc iEEE Computer Security Foundatiu>nsWoakthop 920ccS 9 Lowe, G : Quantifying informa toi Moto |p' property Ha iEEE Symp on Securi)y mul prisspy (Aprils987) 161-166 12 McLean, J : Security modelt and mformition flow Pnoo yyEE hyntp on So Z sqr(x) = x   x Aceasta definitie de functie are doua parti: prima specifica numele functiei (sqr), domeniul sau de definitie (multimea numerelor intregi, Z) si domeniul sau de valori, de asemenea Z A doua parte specifica modul de calcul al valorii functiei sqr(x) pornind de la valoarea argumentului x in limbajul C, aceleiasi functii are aspectul urmator: int sqr(int x) return x * x; } Primul rand reprezinta , cu acelasi rol ca si declaratia sqr' : Z —> Z: el specifica numele functiei, domeniul de valori (cuvantul int dinaintea numelui functiei), si intre paranteze ( ) dupa numele functiei, numele parametrilor, precedati de tipul lor (un singur parametru x, tot intreg) Dupa antet urmeaza , scris intre acolade { } in interior, cuvantul return indica expresia care da valoarea functiei, folosind parametrul x; semnul * denota produsul O a doua functie Sa exprimam acum ridicarea la patrat a unui numar real Matematic, scriem: sqrf : R —> R sqrf(x) = x   x Functia sqrf e diferita de functia sqr scrisa anterior, deoarece are alt domeniu de definitie si de valori Strict vorbind, si operatia de inmultire e diferita, fiind definita pe alta multime, chiar daca se foloseste aceeasi notatie La fel si in limbajul C, nu putem folosi functia definita anterior pe intregi pentru a calcula patratul unui numar real, ci trebuie sa definim alta functie: float sqrf(float x) return x * x; } Cuvantul f loat folosit pentru domeniul de definitie si de valori indica numerele reale 1 2 Tipuri si operatori Tipuri in termeni de limbaje de programare, spunem ca int si float denota tipuri Un (sau tip de date) desemneaza o multime de valori impreuna cu o serie de operatii definite pe aceste valori Deci nu putem echivala pur si simplu tipul int cu multimea Z si tipul float cu R E mai apropiata o analogie intre un tip si o structura algebrica, care de asemenea grupeaza o multime-suport cu un set de operatii, de exemplu inelul (Z, +,  ) sau corpul (R, +,  )• Dupa cum inmultirea nu e inversabila pe Z dar este pe R, tot asa exista diferente intre unii operatori pentru intregi si reali in C O alta diferenta fata de matematica e ca in C, atat intregii si realii au domeniu de valori finit, si implicit realii au precizie finita Detalii, si ce implica aceasta in calcule vom discuta ulterior Constante intregi si reale Constantele numerice pot avea tip intreg, sau tip real, daca au punct zecimal Deci, 2 0 si 2 reprezinta acelasi numar, dar au tip diferit Se pot scrie constante si in format cu exponent de 10; acestea sunt de tip real, chiar daca nu au punct zecimal: 5e2 sau 5E3 sau 500 reprezinta acelasi numar Conform standardului, constantele nu au semn; -3 sau +4 1 sunt expresii cu operatori unari introducere Versiune preliminara 1 3 octombrie 2011 Programarea calculatoarelor Note de curs Marius Minea Operatori aritmetici Operatorii aritmetici + - * si   au aceeasi semnificatie, asociativitate si precedenta ca in matematica Parantezele ( ) se pot folosi pentru a grupa subexpresii in modul dorit De exemplu, pentru trebuie scris x   (m * n) si nu x   m * n, care inseamna —   n Doua aspecte difera intre intregi si reali: impartirea   se efectueaza exact pentru reali si cu rest pentru intregi, iar pentru operanzi intregi exista si operatorul % (modulo) care pentru reali nu e definit Astfel, 7 0 2 0 e 3 5, dar 7 2 da valoarea 3 ! La impartirea intregilor cu semn, rezultatul are valoarea absoluta ca si pentru intregi pozitivi, iar semnul e dat de regula uzuala, deci 7 -2 este -3 Semnul restului (obtinut cu operatorul %) e acelasi cu semnul deimpartitului Deci a b * b + a° "b e egal cu a (ecuatia impartirii cu rest) Exemple fara semn: 9 5 este 1 si 9° "5 este 4 , iar cu semn: 9 -5 este -1 si 9%-5 este 4 ; -9 5 este -1 si -9%5 este -4 ; -9 -5 este 1 si -9%-5 este -4 Operatorii + si - exista si ca operatori unari, cu precedenta mai mare decat operatorii aritmetici binari, deci putem scrie expresii de forma -x + -2 sau 2 * +3 Orice expresie are si ea un tip, determinat de tipul operanzilor si de operatori: astfel, operatiile aritmetice pe intregi produc intregi, iar cele intre reali produc reali; conversii de tip discutam ulterior 1 3 Terminologie: sintaxa si semantica Sintaxa Termenii folositi pentru a discuta structura programelor scrise in limbaje de programare sunt similari cu cei folositi pentru textele scrise in limbaj natural Lin program e alcatuit dintr-o insiruire de , cele mai mici unitati cu inteles de sine statator: • (int return): au inteles predefinit de standardul limbajului (nu se pot redefini) • : nume date de programator pentru functii, parametri, etc Un identificator e o secventa de caractere formata din litere mari si mici, liniuta de subliniere si cifre, care nu incepe cu o cifra si nu este un cuvant cheie Exemple: sqr, x, main, printf, N l, exit Numele functiilor de biblioteca sunt identificatori si ar putea fi refolosite in alt scop, dar generand confuzii • (intregi: -12, reali: 3 14, vom discuta ulterior caractere si siruri) • : operatori, separatori (ex , si ; ), parantezele ( ), acoladele { } etc in C se face distinctie intre litere mari si litere mici Astfel, nr, NR, Nr si nR sunt identificatori diferiti! Toate cuvintele cheie sunt scrise cu minuscule, ca si marea majoritate a functiilor standard intr-un program, elementele lexicale sunt grupate in constructii de limbaj mai complexe, dupa anumite reguli de scriere care formeaza limbajului Am dat ca exemplu definitia unei functii, formata din antet si corp Antetul unei functii (in general cu mai multi parametri) are sintaxa (forma): tip-rezultat nume-functie ( tip-parametru nume-parametru,       , tip-parametru nume:-parametru ") Corpul unei functii contine intre acolade : elemente de limbaj ce specifica actiuni care sa fie executate de program Am folosit pana acum doar instructiunea return, cu sintaxa: return expresie ; intalnite pana acum sunt alcatuite sintactic dupa reguli similare celor din matematica Toti operatorii sunt indicati explicit: in loc de 2n sau n(n + 1) trebuie scris 2*n si respectiv n*(n+l) Semantica Sintaxa limbajului descrie doar felul cum arata un program (sau fragment) corect, dar nu si ce inseamna unui limbaj (de programare, a unei constructii de limbaj, sau a unui a program) descrie intelesul sau (efectul pe care il are) instructiunea return are ca efect (calculul valorii) expresiei precizate, care e returnata ca valoare a functiei pentru argumentele primite Executia functiei se incheie si se revine in locul de unde a fost apelata (aspecte detaliate ulterior) Punerea in pagina a programelor intr-un program C, (fara reprezentare pe ecran, cum ar fi caracterul spatiu, cel de tabulare, cel de linie noua) sunt intre elemente lexicale Ele nu sunt necesare decat daca prin omiterea lor, din doua elemente lexicale adiacente s-ar crea alt element lexical valid, schimband intelesul programului Astfel, functia sqr definita anterior poate fi scrisa int sqr(int x){return x*x;} prin eliminarea spatiilor Cele 3 spatii ramase sunt necesare ca separatori, in acest caz, intre cuvinte cheie si identificatori Spatiul nu e necesar in expresia 3+-5 dar e necesar in 3- -5 pentru ca — ar avea alt inteles (decrementate) in limbajul C Se recomanda respectarea unor conventii pentru mai buna lizibilitate a programelor C, ca de exemplu alinierea una sub alta a instructiunilor succesive, si (de regula cu doua spatii la dreapta) a unei secvente de instructiuni fata de acoladele { } inconjuratoare introducere Versiune preliminara 2 3 octombrie 2011 Programarea calculatoarelor Note de curs Marius Minea 1 4 Apelarea functiilor Definim o functie pentru calculul discriminantului unei ecuatii de gradul ii, a   x2 + b   x + c = 0 Ea are trei parametri, separati prin virgula, cu tipul indicat pentru fiecare parametru in parte float discrim(float a, float b, float c) return sqrf(b) - 4 * a * c; } Expresia apel in expresia de calcul a valorii functiei apare expresia sqrf (b) : e apelata (folosita) functia sqrf pentru a calcula patratul parametrului b Sintaxa este aceeasi ca si in matematica: functie ( expresie-argument , expresie-argument , , expresie-argument ") in care argumentele functiei, separate cu virgule, pot fi expresii arbitrare, spre deosebire de parametrii formati din antet, care sunt identificatori (nume) Tot spre deosebire de antet (dar ca in matematica), tipul fiecarui argument nu se indica explicit, dar trebuie sa corespunda cu tipul parametrului corespunzator, declarat in antet Astfel, am folosit functia sqrf, si nu sqr, deoarece argumentul b e real si nu intreg Expresiile date ca argument sunt evaluate pentru a determina valorile argumentelor pentru care se calculeaza valoarea functiei Ca argument al unui apel de functie putem folosi orice expresie, inclusiv continand un alt apel de functie De exemplu, pornind de la x(i = (ж   x2)2 definim: float pow6(float x) { return sqrf(x * sqrf(x)); } Declaratie si utilizare Functia discrim e corect definita daca se cunoaste semnificatia identificatorului sqrf, de exemplu daca definitia lui sqrf apare in textul programului inainte de definitia functiei discrim (la fel pentru pow6) in limbajul C, semnificatia oricarui identificator trebuie precizata inainte de a fi folosit (trebuie , aspect tratat in detaliu ulterior), tot dupa cum intr-un text matematic trebuie definit intai fiecare simbol folosit De exemplu, parametrii unei functii sunt declarati in antetul acesteia, unde li se precizeaza numele si tipul, si ei pot fi apoi folositi in corpul functiei 1 5 Programul principal Pana acum am definit si apelat doar functii individuale, fara a scrie un program complet Standardul C precizeaza ca intr-un (de exemplu sub un sistem de operare), la inceputul rularii programului se apeleaza functia numita in mod conventional main Aceasta functie are tipul int, si returneaza mediului de executie o valoare intreaga pe care acesta o poate testa pentru a determina daca rularea programului s-a incheiat cu succes Conventional, terminarea cu succes se semnaleaza prin returnarea valorii 0 (zero) din main Terminarea cu eroare se semnaleaza prin returnarea unei valori nenule, care poate reprezenta un cod ce identifica, prin diverse conventii, tipul sau cauza erorii Cu aceste precizari, cel mai simplu program C poate fi scris: int main(void) return 0; } Programul returneaza valoarea 0 (terminare cu succes), si atat Cuvantul cheie void reprezinta tipul vid (cu multime vida de valori) El indica faptul ca functia nu ia nici un parametru Vom vedea mai tarziu ca functia main poate sa aiba parametri transmisi de mediul de executie 1 6 Tiparirea de texte Un program are un efect observabil doar daca provoaca o schimbare (numita efect lateral) in starea mediului de executie (mediului inconjurator), de exemplu prin tiparirea (scrierea) unui mesaj in limbajul C, scrierea (si citirea) se face tot prin intermediul unor functii Functia printf permite tiparirea intr-o varietate de moduri, prin specificarea unui format sau tipar in cel mai simplu mod de folosire, apelul printf ("salut!") are ca efect tiparirea textului salut! pe ecran introducere Versiune preliminara 3 3 octombrie 2011 Programarea calculatoarelor Note de curs Marius Minea Din punct de vedere sintactic, acesta este un apel de functie, la fel ca si sqr (4), avand ca argument un sir de caractere in C, o constanta sir de caractere se scrie intre ghilimele " " care nu fac parte din sir ci au doar rolul de a-1 identifica si a-1 separa de restul textului sursa al programului Functia printf returneaza un int, si anume numarul de caractere scris, deci expresia printf ("salut!") are o valoare, 6 De regula nu ne intereseaza aceasta, ci doar efectul tiparirii Functii de biblioteca si fisiere antet Functia printf este o specificata de standardul limbajului C ei (precizand numele, tipul si parametrii) e data intr-un ( ) numit stdio h (de la standard input output) prezent in fiecare implementare conforma cu standardul Declaratia functiei printf, obligatorie inainte de folosire, se include cu secventa #include Aceasta este o , indicata prin caracterul # la inceput de linie, adresata , care prelucreaza sursa programului inainte de compilarea propriu-zisa Efectul e ca si cum continutul fisierului indicat, stdio h, (cu toate declaratiile) ar fi inclus la acel punct din program 1 7 Secventierea Putem scrie acum un prim program complet care are un efect vizibil si tipareste un text pe ecran: #include int main(void) printf("hello, world!"); return 0; } Acest program simplu evidentiaza un aspect fundamental in programare, si anume in general, corpul oricarei functii (in acest caz, main) contine o secventa de , fiecare reprezentand actiuni care trebuie efectuate de program instructiunile se executa secvential, in ordinea data de textul programului, mai putin exceptiile (discutate ulterior) care modifica explicit acest comportament De exemplu, instructiunea return incheie executia unei functii, indiferent daca in corpul functiei mai urmeaza alte instructiuni Programul dat are doua instructiuni, al caror efect este: - apelul la functia printf are ca efect afisarea textului hello, world! - se incheie executia functiei main (si a programului), returnand 0 (succes) mediului de executie instructiunea expresie Prima instructiune din programul dat e un apel de functie in general, o instructiune cu sintaxa expresie ; are ca efect evaluarea expresiei date in cazul de mai sus, esential e efectul lateral al evaluarii expresiei, si anume tiparirea; rezultatul expresiei (13, numarul de caractere scris de printf) e ignorat si nu e folosit mai departe Sunt corecte si instructiunile sqr(4); (daca e declarata functia sqr), sau 2 + 3; sau chiar 3 14; insa ele nu au nici un efect: expresiile sunt evaluate, dar nu sunt folosite si nici nu are loc tiparirea lor sau alt efect vizibil Semnul ; nu separa instructiunile, ci face parte integranta din ele (fiecare, inclusiv ultima, ar fi incorecta fara prezenta sa) Deci in limbajul C ; nu e separator, ci terminator de instructiuni 1 8 Comentarii Structura unui program C e supusa unor reguli stricte (sintaxa limbajului) Totusi, e utila anotarea textului programului cu explicatii in format liber Limbajul C admite doua feluri de : - incepand cu caracterele  * si terminat cu caracterele *  (se poate intinde pe oricate randuri) - incepand cu caracterele   si pana la urmatorul caracter de linie noua (neincluzandu-1 pe acesta) Secventele de inceput de comentariu nu sunt interpretate in interiorul constantelor sir de caractere (comentariile nu pot incepe intr-un sir) Compilatorul ignora continutul unui comentariu, si cauta doar sfarsitul sau in particular, intr-un comentariu nu se interpreteaza caracterele de inceput de comentariu, si deci nu se pot incadra comentarii in alte comentarii in secventa  * 1  * 2 *  3 *  comentariul se incheie la primul grup *  iar restul 3 *  nu e o secventa valida de cod C introducere Versiune preliminara 4 3 octombrie 2011 Programarea calculatoarelor Note de curs Marius Minea Comentariile sunt tratate ca si cum ar fi spatii albe, ele au efect de separator intre doua elemente lexicale succesive: int * un intreg * x e echivalent cu int x in practica dezvoltarii software, comentarea programelor e absolut necesara: - ajuta la intelegerea programului de catre altii, sau chiar de autor la o citire ulterioara - contin informatii despre evolutia programului: versiuni, revizii, adaugiri, erori corectate, etc - servesc ca baza pentru documentatie (uneori extrasa automat din comentarii cu structura precizata) Ca un minim, e bine ca programul saa fie documentat la nivelul functtiilor, precizand pentru fiecare: - succint, ce prelucrare efectueazaa tsi ce scop are - semnificatia fiecarui parametru, restrictii asupra valorilor valide, ti eventuale relatii intre parametri - specificatia functiei (relatia intre parametri ti valoarea returnata sau alte valori calculate de functie) - eventuale efecte laterale (citiri, scrieri, variabile globale modificate, alocare de memorie) - restrictii privind apelarea functiei (de exemplu inainte dupa alte apeluri sau prelucrari in program) 1 9 Tiparirea de numere Programe cu mai multe functii Un exemplu de program care definette ti apoi apeleaza o functie: #include int sqr (int x) { return x * x; }    definita inainte de folosire int main(void) printf("3 ori 2 la patrat e ");    tipareste un sir printf("%d", 3 * sqr(2));    tipareste un intreg: 12 return 0; Un program poate conttine succesiv mai multe definittii de functtii (tsi alte declarattii, vom vedea ulterior) Functia sqr e definita inainte de main, pentru a permite apelarea ei din programul principal Tiparirea unui intreg Pentru a tipari un intreg, functia printf e apelata cu doua argumente Mecanismul functiilor cu numar variabil de argumente va fi detaliat ulterior; in esenta, primul argument al lui printf este intotdeauna un sir de caractere, numit format sau tipar pentru ca examinandu-l, functia determina cate argumente urmeaza ti ce tip au Pentru a tipari o valoare intreaga (in baza 10), se da ca prim argument formatul (tirul de caractere) "%d" (de la decimal), ti ca al doilea argument expresia cu valoare intreaga care va fi tiparita Tipaarirea unui real Urmaatorul exemplu afitseazaa valoarea dataa de o functtie de bibliotecaa: #include    inclus pentru declaratia functiei asin #include int main(void) printf("Valoarea lui pi este 2 * asin(1): ");    text printf("%f", 2 * asin(1 0));    afiseaza valoarea 3 141593 return 0; Declaratia functiei asin (arcsinus) e data impreuna cu alte functii matematice in fitierul math h : double asin(double x); Parametrul ti rezultatul au tip real, dar cu precizie mai mare decat float, de unde ti numele (double precision) Tipul double (cuvant cheie al limbajului C) e folosit de majoritatea functiilor matematice de biblioteca, precizia de 6 zecimale a lui float fiind adesea insuficienta chiar pentru calcule uzuale Tipaarirea are loc similar ca tsi pentru iintregi, dar folosind ca parametru alt format pentru tipaarire, tirul "%f" (de la floating point) implicit, tiparirea pentru reali se face cu 6 cifre dupa punctul zecimal Distingem faptul ca prima aparitie a lui 2 * asin(1) in program e inauntrul unui sir de caractere: este un simplu text, afitat ca atare de printf A doua aparitie este o expresie, data ca argument la printf: ea este evaluata prin apelarea functiei asin, iar valoarea reala rezultanta e tiparita introducere Versiune preliminam 5 3 octombrie 2011 Programarea calculatoarelor Note de curs Marius Minea Conversii de tip Expresia 2 * asin(l O) innmulteste un intreg cu un real La operatii intre cele doua tipuri, intregul e convertit automat (implicit) la real si pentru argumente de functie are loc conversia valorii la tipul corespunzator parametrului, daca acesta e cunoscut din declaratia functiei; puteam scrie astfel si asin(l) La printf, formatul ° "f poate fi folosit si pentru float, acesta fiind convertit automat la double Alte reguli de conversie vor fi discutate si sistematizate ulterior Sfarsitul de linie Pentru apeluri succesive la functia printf, afisarea se continua de unde s-a oprit Trecerea la linie noua trebuie specificata explicit O linie noua intr-un text e marcata de un caracter special (de control), a carui tiparire are ca efect avansul cu o linie Deoarece acest caracter nu are aspect vizibil, ci e caracterizat prin efectul sau, el nu poate fi reprezentat direct in program (o constanta sir de caractere intre " " nu poate fi despartita pe mai multe linii) Conventional, caracterul de linie noua se reprezinta in program (de exemplu intr-un sir de caractere) prin secventa de doua caractere  n (de la newline) Secventa printf ("unu n"); printf ("doi"); tipareste cele doua cuvinte pe linii succesive, si cursorul ramane pe linia a doua Acelasi efect il are printf ("unu ndoi"); Caracterul   folosit pentru introducerea secventelor speciale  n si altele trebuie dublat pentru a se reprezenta pe sine insusi intr-un sir Astfel, printf("c:  windows") va tipari c: windows Formatul general in printf Se poate combina in acelasi apel la printf scrierea valorilor de mai multe tipuri Functia printf parcurge sirul de caractere dat ca prim argument, si tipareste fiecare caracter reprezentat direct sau prin secvente speciale ( n) La intalnirea unui , cum ar fi ° "d sau %f, printf ia urmatorul argument din cele primite si il tipareste corespunzator cu formatul indicat (intreg, real, etc ) De exemplu, printf ("un intreg ° "d si un real ° "f n", 2, 3 14) va tipari: un intreg 2 si un real 3 140000 trecand apoi la linie noua Pentru a fi interpretat ca un caracter obisnuit, caracterul % trebuie dublat in sirul de format: printf ("7%%") va tipari 7% Corespondenta intre numarul de specificatori de format cu % din sirul dat ca prim argument si numarul argumentelor urmatoare, precum si corespondenta tipurilor indicate cu cele ale argumentelor trebuie verificata de programator in caz de nepotrivire, comportamentul functiei printf e nedefinit! 1 10 Operatorul conditional Expresia conditionala Adeseori, valoarea unei functii e calculata cu formule (expresii) diferite pe fragmente ale domeniului de definitie Un exemplu e functia valoare absoluta pentru numere intregi: abs : Z —> Z abs(x) = = 0 ? x : -x; }    operator minus unar in definitia cu acolada am scris conditia "altfel" in loc de x , = (mai mic, mai mare, mai mic sau egal, mai mare sau egal) == (egal) si != (diferit) Alti operatori, precum si detalii despre notiunea de valoare logica in C vor fi discutati ulterior Ca exemple, putem defini similar cu functia valoare absoluta functii simple pentru maxim si minim: int max(int a, int b) { return a > b ? a : b; } int min(int a, int b) { return a {—1, 0,1} sgn(x) = 0 Scrisa astfel, functia nu poate fi tradusa direct in program, deoarece operatorul conditional permite doar o decizie binara, pe doua cazuri, si nu pe mai multe Putem reformula problema in felul urmator: efectuam o prima decizie (test), de exemplu dupa conditia x 0) daca x = 0 0 altfel (ж >0) 1 Asa cum a fost rescrisa, functia necesita in continuare mai multe decizii pentru stabilirea valorii, dar fiecare din ele e o decizie logica cu doua ramuri, dupa valoarea de adevar a unei conditii, si poate fi exprimata in C cu operatorul conditional O alta diferenta e ca prima varianta are fiecare ramura identificata printr-o conditie, iar ordinea dintre ele nu e relevanta, conditiile fiind exclusive intre ele in varianta a doua, ordinea de evaluare a conditiilor conteaza, rezultatul unei decizii dand informatie utila (notata intre paranteze) pentru deciziile urmatoare Astfel, o valoare falsa pentru conditia ж = 0 permite sa deducem ca ж > 0 doar datorita faptului ca pe aceasta ramura a fost stabilit deja ca ж > 0 Cu aceste observatii, functia poate fi transcrisa direct dupa cum urmeaza: int sgn(int x) return x =0?x>0?l : 0 : -1; in acest caz, expresia de pe prima ramura (adevarat) a primei decizii este la randul ei o expresie conditionala Ramurile unei expresii conditionale complexe si gruparea lor pot fi identificate unic, si nu depind de punerea in pagina a codului: orice ? corespunde unui unic : dupa cum orice decizie are o ramura pentru adevarat si una pentru fals introducere Versiune preliminara 7 3 octombrie 2011 Programarea calculatoarelor Note de curs Marius Minea 1 11 Descompunerea in subprobleme in conceperea unui program (rezolvarea unei probleme), daca identificam o prelucrare (un calcul) care reprezinta rezolvarea unei subprobleme, putem defini o functie in acest scop Ea poate fi atunci apelata, cu parametrii corespunzatori, ori de cate ori calculul (subproblema) respectiva apare in solutie De exemplul, sa scriem o functie care calculeaza minimul a trei numere a, b si c, de exemplu, intregi Comparam intai doua dintre numere Daca a b, e suficient sa examinam ca si candidati pe b si c in ambele cazuri, am redus problema la una mai simpla: calculul minimului a doua numere Presupunand existenta unei functii min2 care face acest lucru, scriem pentru problema initiala: int min3(int a, int b, int c) { return a c ? a : min2(b, c); } intr-un program complet scriem functiile in ordinea: min2, med3ord, mediana, deoarece fiecare o foloseste pe cea dinainte introducere Versiune preliminara 8 3 octombrie 2011 Programarea calculatoarelor Note de curs Marius Minea 1 introducere in programarea in C 1 1 Functii in limbajul C Calcule si functii La origine, rolul programelor e de a efectua in principal calcule matematice Discutam de aceea structura programelor facand o paralela cu notiunile din matematica, care stau si la baza studiului teoretic al limbajelor de programare in matematica, notiunea de calcul e strans legata de cea de : in calcule se folosesc functii cunoscute, se definesc altele noi si se aplica intr-o anumit succesiune La fel, codul dintr-un fisier de program C e structurat in functii, care sunt definite similar celor din matematica O functie simpla Fie ca exemplu functia care ridica la patrat un numar intreg Matematic, scriem: sqr : Z Z sqr(x) = x   x Aceasta definitie de functie are doua parti: prima specifica numele functiei (sqr), domeniul sau de definitie (multimea numerelor intregi, Z) si domeniul sau de valori, de asemenea Z A doua parte specifica modul de calcul al valorii functiei sqr(x) pornind de la valoarea argumentului x in limbajul C, aceleiasi functii are aspectul urmator: int sqr(int x) { return x * x; } Primul rand reprezinta , cu acelasi rol ca si declaratia sqr' : Z Z: el specifica numele functiei, domeniul de valori (cuvantul int dinaintea numelui functiei), si intre paranteze ( ) dupa numele functiei, numele parametrilor, precedati de tipul lor (un singur parametru x, tot intreg) Dupa antet urmeaza , scris intre acolade { } in interior, cuvantul return indica expresia care da valoarea functiei, folosind parametrul x; semnul * denota produsul O a doua functie Sa exprimam acum ridicarea la patrat a unui numar real Matematic, scriem: sqrf : R R sqrf(x) = x   x Functia sqrf e diferita de functia sqr scrisa anterior, deoarece are alt domeniu de definitie si de valori Strict vorbind, si operatia de inmultire e diferita, fiind definita pe alta multime, chiar daca se foloseste aceeasi notatie La fel si in limbajul C, nu putem folosi functia definita anterior pe intregi pentru a calcula patratul unui numar real, ci trebuie sa definim alta functie: float sqrf(float x) { return x * x; } Cuvantul f loat folosit pentru domeniul de definitie si de valori indica numerele reale 1 2 Tipuri si operatori Tipuri in termeni de limbaje de programare, spunem ca int si float denota tipuri Un (sau tip de date) desemneaza o multime de valori impreuna cu o serie de operatii definite pe aceste valori Deci nu putem echivala pur si simplu tipul int cu multimea Z si tipul float cu R E mai apropiata o analogie intre un tip si o structura algebrica, care de asemenea grupeaza o multime-suport cu un set de operatii, de exemplu inelul (Z, +,  ) sau corpul (R, +,  )• Dupa cum inmultirea nu e inversabila pe Z dar este pe R, tot asa exista diferente intre unii operatori pentru intregi si reali in C O alta diferenta fata de matematica e ca in C, atat intregii si realii au domeniu de valori finit, si implicit realii au precizie finita Detalii, si ce implica aceasta in calcule vom discuta ulterior Constante intregi si reale Constantele numerice pot avea tip intreg, sau tip real, daca au punct zecimal Deci, 2 0 si 2 reprezinta acelasi numar, dar au tip diferit Se pot scrie constante si in format cu exponent de 10; acestea sunt de tip real, chiar daca nu au punct zecimal: 5e2 sau 5E3 sau 500 reprezinta acelasi numar Conform standardului, constantele nu au semn; -3 sau +4 1 sunt expresii cu operatori unari introducere Versiune preliminara 1 26 martie 2008 Programarea calculatoarelor Note de curs Marius Minea Operatori aritmetici Operatorii aritmetici + - * si   au aceeasi semnificatie, asociativitate si precedenta ca in matematica Parantezele ( ) se pot folosi pentru a grupa subexpresii in modul dorit De exemplu, pentru trebuie scris x   (m * n) si nu x   m * n, care inseamna —   n Doua aspecte difera intre intregi si reali: impartirea   se efectueaza exact pentru reali si cu rest pentru intregi, iar pentru operanzi intregi exista si operatorul % (modulo) care pentru reali nu e definit Astfel, 7 0 2 0 e 3 5, dar 7 2 da valoarea 3 ! La impartirea intregilor cu semn, rezultatul are valoarea absoluta ca si pentru intregi pozitivi, iar semnul e dat de regula uzuala, deci 7 -2 este -3 Semnul restului (obtinut cu operatorul %) e acelasi cu semnul deimpartitului Deci a b * b + a° "b e egal cu a (ecuatia impartirii cu rest) Exemple fara semn: 9 5 este 1 si 9° "5 este 4 , iar cu semn: 9 -5 este -1 si 9%-5 este 4 ; -9 5 este -1 si -9%5 este -4 ; -9 -5 este 1 si -9%-5 este -4 Operatorii + si - exista si ca operatori unari, cu precedenta mai mare decat operatorii aritmetici binari, deci putem scrie expresii de forma -x + -2 sau 2 * +3 Orice expresie are si ea un tip, determinat de tipul operanzilor si de operatori: astfel, operatiile aritmetice pe intregi produc intregi, iar cele intre reali produc reali; conversii de tip discutam ulterior 1 3 Terminologie: sintaxa si semantica Sintaxa Termenii folositi pentru a discuta structura programelor scrise in limbaje de programare sunt similari cu cei folositi pentru textele scrise in limbaj natural Lin program e alcatuit dintr-o insiruire de , cele mai mici unitati cu inteles de sine statator: • (int, return): au inteles predefinit de standardul limbajului (nu se pot redefini) • : nume date de programator pentru functii, parametri, etc Un identificator e o secventa de caractere formata din litere mari si mici, liniuta de subliniere si cifre, care nu incepe cu o cifra si nu este un cuvant cheie Exemple: sqr, x, main, printf, N l, exit Numele functiilor de biblioteca sunt identificatori si ar putea fi refolosite in alt scop, dar generand confuzii • (intregi: -12, reali: 3 14, vom discuta ulterior caractere si siruri) • : operatori, separatori (ex , si ; ), parantezele ( ), acoladele { } etc in C se face distinctie intre litere mari si litere mici Astfel, nr, NR, Nr si nR sunt identificatori diferiti! Toate cuvintele cheie sunt scrise cu minuscule, ca si marea majoritate a functiilor standard intr-un program, elementele lexicale sunt grupate in constructii de limbaj mai complexe, dupa anumite reguli de scriere care formeaza limbajului Am dat ca exemplu definitia unei functii, formata din antet si corp Antetul unei functii (in general cu mai multi parametri) are sintaxa (forma): tip-rezultat nume-functie ( tip-parametru nume-parametru,       , tip-parametru nume-parametru ") Corpul unei functii contine intre acolade : elemente de limbaj ce specifica actiuni care sa fie executate de program Am folosit pana acum doar instructiunea return, cu sintaxa: return expresie ; intalnite pana acum sunt alcatuite sintactic dupa reguli similare celor din matematica Toti operatorii sunt indicati explicit: in loc de 2n sau n(n + 1) trebuie scris 2*n si respectiv n*(n+l) Semantica Sintaxa limbajului descrie doar felul cum arata un program (sau fragment) corect, dar nu si ce inseamna unui limbaj (de programare, a unei constructii de limbaj, sau a unui a program) descrie intelesul sau (efectul pe care il are) instructiunea return are ca efect (calculul valorii) expresiei precizate, care e returnata ca valoare a functiei pentru argumentele primite Executia functiei se incheie si se revine in locul de unde a fost apelata (aspecte detaliate ulterior) Punerea in pagina a programelor intr-un program C, (fara reprezentare pe ecran, cum ar fi caracterul spatiu, cel de tabulare, cel de linie noua) sunt intre elemente lexicale Ele nu sunt necesare decat daca prin omiterea lor, din doua elemente lexicale adiacente s-ar crea alt element lexical valid, schimband intelesul programului Astfel, functia sqr definita anterior poate fi scrisa int sqr(int x){return x*x;} prin eliminarea spatiilor Cele 3 spatii ramase sunt necesare ca separatori, in acest caz, intre cuvinte cheie si identificatori Spatiul nu e necesar in expresia 3+-5 dar e necesar in 3- -5 pentru ca — ar avea alt inteles (decrementate) in limbajul C Se recomanda respectarea unor conventii pentru mai buna lizibilitate a programelor C, ca de exemplu alinierea una sub alta a instructiunilor succesive, si (de regula cu doua spatii la dreapta) a unei secvente de instructiuni fata de acoladele { } inconjuratoare introducere Versiune preliminara 2 26 martie 2008 Programarea calculatoarelor Note de curs Marius Minea 1 4 Apelarea functiilor Definim o functie pentru calculul discriminantului unei ecuatii de gradul ii, a   x2 + b   x + c = 0 Ea are trei parametri, separati prin virgula, cu tipul indicat pentru fiecare parametru in parte float discrim(float a, float b, float c) { return sqrf(b) - 4 * a * c; } Expresia apel in expresia de calcul a valorii functiei apare expresia sqrf (b) : e apelata (folosita) functia sqrf pentru a calcula patratul parametrului b Sintaxa este aceeasi ca si in matematica: functie ( expresie-argument , expresie-argument , , expresie-argument ") in care argumentele functiei, separate cu virgule, pot fi expresii arbitrare, spre deosebire de parametrii formali din antet, care sunt identifi catori (nume) Tot spre deosebire de antet (dar ca in matematica), tipul fiecarui argument nu se indica explicit, dar trebuie sa corespunda cu tipul parametrului corespunzator, declarat in antet Astfel, am folosit functia sqrf, si nu sqr, deoarece argumentul b e real si nu intreg Expresiile date ca argument sunt evaluate pentru a determina valorile argumentelor pentru care se calculeaza valoarea functiei Ca argument al unui apel de functie putem folosi orice expresie, inclusiv continand un alt apel de functie De exemplu, pornind de la x6 = (x   x2)2 definim: float pow6(float x) { return sqrf(x * sqrf(x)); } Declaratie si utilizare Functia discrim e corect definita daca se cunoaste semnificatia identificatorului sqrf, de exemplu daca definitia lui sqrf apare in textul programului inainte de definitia functiei discrim (la fel pentru pow6) in limbajul C, semnificatia oricarui identificator trebuie precizata inainte de a fi folosit (trebuie , aspect tratat in detaliu ulterior), tot dupa cum intr-un text matematic trebuie definit intai fiecare simbol folosit De exemplu, parametrii unei functii sunt declarati in antetul acesteia, unde li se precizeaza numele si tipul, si ei pot fi apoi folositi in corpul functiei 1 5 Programul principal Pana acum am definit si apelat doar functii individuale, fara a scrie un program complet Standardul C precizeaza ca intr-un (de exemplu sub un sistem de operare), la inceputul rularii programului se apeleaza functia numita in mod conventional main Aceasta functie are tipul int, si returneaza mediului de executie o valoare intreaga pe care acesta o poate testa pentru a determina daca rularea programului s-a incheiat cu succes Conventional, terminarea cu succes se semnaleaza prin returnarea valorii 0 (zero) din main Terminarea cu eroare se semnaleaza prin returnarea unei valori nenule, care poate reprezenta un cod ce identifica, prin diverse conventii, tipul sau cauza erorii Cu aceste precizari, cel mai simplu program C poate fi scris: int main(void) { return 0; } Programul returneaza valoarea 0 (terminare cu succes), si atat Cuvantul cheie void reprezinta tipul vid (cu multime vida de valori) El indica faptul ca functia nu ia nici un parametru Vom vedea mai tarziu ca functia main poate sa aiba parametri transmisi de mediul de executie 1 6 Tiparirea de texte Un program are un efect observabil doar daca provoaca o schimbare (numita efect lateral) in starea mediului de executie (mediului inconjurator), de exemplu prin tiparirea (scrierea) unui mesaj in limbajul C, scrierea (si citirea) se face tot prin intermediul unor functii Functia printf permite tiparirea intr-o varietate de moduri, prin specificarea unui format, sau tipar in cel mai simplu mod de folosire, apelul printf ("salut!") are ca efect tiparirea textului salut! pe ecran introducere Versiune preliminara 3 26 martie 2008 Programarea calculatoarelor Note de curs Marius Minea Din punct de vedere sintactic, acesta este un apel de functie, la fel ca si sqr (4), avand ca argument un sir de caractere in C, o constanta sir de caractere se scrie intre ghilimele " " care nu fac parte din sir ci au doar rolul de a-1 identifica si a-1 separa de restul textului sursa al programului Functia printf returneaza un int, si anume numarul de caractere scris, deci expresia printf ("salut!") are o valoare, 6 De regula nu ne intereseaza aceasta, ci doar efectul tiparirii Functii de biblioteca si fisiere antet Functia printf este o specificata de standardul limbajului C ei (precizand numele, tipul si parametrii) e data intr-un ( ) numit stdio h (de la standard input output) prezent in fiecare implementare conforma cu standardul Declaratia functiei printf, obligatorie inainte de folosire, se include cu secventa #include Aceasta este o , indicata prin caracterul # la inceput de linie, adresata , care prelucreaza sursa programului inainte de compilarea propriu-zisa Efectul e ca si cum continutul fisierului indicat, stdio h, (cu toate declaratiile) ar fi inclus la acel punct din program 1 7 Secventierea Putem scrie acum un prim program complet care are un efect vizibil si tipareste un text pe ecran: #include int main(void) { printf("hello, world!"); return 0; } Acest program simplu evidentiaza un aspect fundamental in programare, si anume in general, corpul oricarei functii (in acest caz, main) contine o secventa de , fiecare reprezentand actiuni care trebuie efectuate de program instructiunile se executa secvential, in ordinea data de textul programului, mai putin exceptiile (discutate ulterior) care modifica explicit acest comportament De exemplu, instructiunea return incheie executia unei functii, indiferent daca in corpul functiei mai urmeaza alte instructiuni Programul dat are doua instructiuni, al caror efect este: - apelul la functia printf are ca efect afisarea textului hello, world! - se incheie executia functiei main (si a programului), returnand 0 (succes) mediului de executie instructiunea expresie Prima instructiune din programul dat e un apel de functie in general, o instructiune cu sintaxa expresie ; are ca efect evaluarea expresiei date in cazul de mai sus, esential e efectul lateral al evaluarii expresiei, si anume tiparirea; rezultatul expresiei (13, numarul de caractere scris de printf) e ignorat si nu e folosit mai departe Sunt corecte si instructiunile sqr(4); (daca e declarata functia sqr), sau 2 + 3; sau chiar 3 14; insa ele nu au nici un efect: expresiile sunt evaluate, dar nu sunt folosite si nici nu are loc tiparirea lor sau alt efect vizibil Semnul ; nu separa instructiunile, ci face parte integranta din ele (fiecare, inclusiv ultima, ar fi incorecta fara prezenta sa) Deci in limbajul C ; nu e separator, ci terminator de instructiuni 1 8 Comentarii Structura unui program C e supusa unor reguli stricte (sintaxa limbajului) Totusi, e utila anotarea textului programului cu explicatii in format liber Limbajul C admite doua feluri de : - incepand cu caracterele  * si terminat cu caracterele *  (se poate intinde pe oricate randuri) - incepand cu caracterele   si pana la urmatorul caracter de linie noua (neincluzandu-1 pe acesta) Secventele de inceput de comentariu nu sunt interpretate in interiorul constantelor sir de caractere (comentariile nu pot incepe intr-un sir) Compilatorul ignora continutul unui comentariu, si cauta doar sfarsitul sau in particular, intr-un comentariu nu se interpreteaza caracterele de inceput de comentariu, si deci nu se pot incadra comentarii in alte comentarii in secventa  *l *2* 3*  comentariul se incheie la primul grup *  iar restul 3 *  nu e o secventa valida de cod C introducere Versiune preliminara 4 26 martie 2008 Programarea calculatoarelor Note de curs Marius Minea Comentariile sunt tratate ca si cum ar fi spatii albe, ele au efect de separator intre doua elemente lexicale succesive: int * un intreg * x e echivalent cu int x in practica dezvoltarii software, comentarea programelor e absolut necesara: - ajuta la intelegerea programului de catre altii, sau chiar de autor la o citire ulterioara - contin informatii despre evolutia programului: versiuni, revizii, adaugiri, erori corectate, etc - servesc ca baza pentru documentatie (uneori extrasa automat din comentarii cu structura precizata) Ca un minim, e bine ca programul saa fie documentat la nivelul functtiilor, precizand pentru fiecare: - succint, ce prelucrare efectueazaa tsi ce scop are - semnificattia fiecaarui parametru, restricttii asupra valorilor valide, tsi eventuale relattii intre parametri - specificatia functiei (relatia intre parametri ti valoarea returnata sau alte valori calculate de functie) - eventuale efecte laterale (citiri, scrieri, variabile globale modificate, alocare de memorie) - restrictii privind apelarea functiei (de exemplu inainte dupa alte apeluri sau prelucrari in program) 1 9 Tiparirea de numere Programe cu mai multe functii Un exemplu de program care definette ti apoi apeleaza o functie: #include int sqr (int x) { return x * x; }    definita inainte de folosire int main(void) printf("3 ori 2 la patrat e ");    tipareste un sir printf("%d", 3 * sqr(2));    tipareste un intreg: 12 return 0; Un program poate conttine succesiv mai multe definittii de functtii (tsi alte declarattii, vom vedea ulterior) Functia sqr e definita inainte de main, pentru a permite apelarea ei din programul principal Tiparirea unui intreg Pentru a tipari un intreg, functia printf e apelata cu doua argumente Mecanismul functiilor cu numar variabil de argumente va fi detaliat ulterior; in esenta, primul argument al lui printf este intotdeauna un sir de caractere, numit format sau tipar pentru ca examinandu-l, functia determina cate argumente urmeaza ti ce tip au Pentru a tipari o valoare intreaga (in baza 10), se da ca prim argument formatul (tirul de caractere) "%d" (de la decimal), ti ca al doilea argument expresia cu valoare intreaga care va fi tiparita Tipaarirea unui real Urmaatorul exemplu afitseazaa valoarea dataa de o functtie de bibliotecaa: #include    inclus pentru declaratia functiei asin #include int main(void) printf("Valoarea lui pi este 2 * asin(1): ");    text printf("%f", 2 * asin(1 0));    afiseaza valoarea 3 141593 return 0; Declarasia functiei asin (arcsinus) e data impreuna cu alte functii matematice in fitierul math h : double asin(double x); Parametrul ti rezultatul au tip real, dar cu precizie mai mare decat float, de unde ti numele (double precision) Tipul double (cuvant cheie al limbajului C) e folosit de majoritatea functiilor matematice de biblioteca, precizia de 6 zecimale a lui float fiind adesea insuficienta chiar pentru calcule uzuale Tipaarirea are loc similar ca tsi pentru iintregi, dar folosind ca parametru alt format pentru tipaarire, tirul "%f" (de la floating point) implicit, tiparirea pentru reali se face cu 6 cifre dupa punctul zecimal Distingem faptul ca prima aparitie a lui 2 * asin(1) in program e inauntrul unui sir de caractere: este un simplu text, afitat ca atare de printf A doua aparitie este o expresie, data ca argument la printf: ea este evaluata prin apelarea functiei asin, iar valoarea reala rezultanta e tiparita introducere Versiune preliminara 5 26 martie 2008 Programarea calculatoarelor Note de curs Marius Minea Conversii de tip Expresia 2 * asin(l O) innmulteste un intreg cu un real La operatii intre cele doua tipuri, intregul e convertit automat (implicit) la real si pentru argumente de functie are loc conversia valorii la tipul corespunzator parametrului, daca acesta e cunoscut din declaratia functiei; puteam scrie astfel si asin(l) La printf, formatul ° "f poate fi folosit si pentru float, acesta fiind convertit automat la double Alte reguli de conversie vor fi discutate si sistematizate ulterior Sfarsitul de linie Pentru apeluri succesive la functia printf, afisarea se continua de unde s-a oprit Trecerea la linie noua trebuie specificata explicit O linie noua intr-un text e marcata de un caracter special (de control), a carui tiparire are ca efect avansul cu o linie Deoarece acest caracter nu are aspect vizibil, ci e caracterizat prin efectul sau, el nu poate fi reprezentat direct in program (o constanta sir de caractere intre " " nu poate fi despartita pe mai multe linii) Conventional, caracterul de linie noua se reprezinta in program (de exemplu intr-un sir de caractere) prin secventa de doua caractere  n (de la newline) Secventa printf ("unu n"); printf ("doi"); tipareste cele doua cuvinte pe linii succesive, si cursorul ramane pe linia a doua Acelasi efect il are printf ("unu ndoi"); Caracterul   folosit pentru introducerea secventelor speciale  n si altele trebuie dublat pentru a se reprezenta pe sine insusi intr-un sir Astfel, printf("c:  windows") va tipari c: windows Formatul general in printf Se poate combina in acelasi apel la printf scrierea valorilor de mai multe tipuri Functia printf parcurge sirul de caractere dat ca prim argument, si tipareste fiecare caracter reprezentat direct sau prin secvente speciale ( n) La intalnirea unui , cum ar fi ° "d sau %f, printf ia urmatorul argument din cele primite si il tipareste corespunzator cu formatul indicat (intreg, real, etc ) De exemplu, printf ("un intreg ° "d si un real ° "f n", 2, 3 14) va tipari: un intreg 2 si un real 3 140000 trecand apoi la linie noua Pentru a fi interpretat ca un caracter obisnuit, caracterul % trebuie dublat in sirul de format: printf ("7%%") va tipari 7% Corespondenta intre numarul de specificatori de format cu % din sirul dat ca prim argument si numarul argumentelor urmatoare, precum si corespondenta tipurilor indicate cu cele ale argumentelor trebuie verificata de programator in caz de nepotrivire, comportamentul functiei printf e nedefinit! 1 10 Operatorul conditional Expresia conditionala Adeseori, valoarea unei functii e calculata cu formule (expresii) diferite pe fragmente ale domeniului de definitie Un exemplu e functia valoare absoluta pentru numere intregi: abs : Z —> Z abs(x) = = 0 ? x : -x; }    operator minus unar in definitia cu acolada am scris conditia "altfel" in loc de x , = (mai mic, mai mare, mai mic sau egal, mai mare sau egal) == (egal) si != (diferit) Alti operatori, precum si detalii despre notiunea de valoare logica in C vor fi discutati ulterior Ca exemple, putem defini similar cu functia valoare absoluta functii simple pentru maxim si minim: int max(int a, int b) { return a > b ? a : b; } int min(int a, int b) { return a {—1, 0,1} sgn(x) = 0 Scrisa astfel, functia nu poate fi tradusa direct in program, deoarece operatorul conditional permite doar o decizie binara, pe doua cazuri, si nu pe mai multe Putem reformula problema in felul urmator: efectuam o prima decizie (test), de exemplu dupa conditia x 0) daca x = 0 0 altfel (ж >0) 1 Asa cum a fost rescrisa, functia necesita in continuare mai multe decizii pentru stabilirea valorii, dar fiecare din ele e o decizie logica cu doua ramuri, dupa valoarea de adevar a unei conditii, si poate fi exprimata in C cu operatorul conditional O alta diferenta e ca prima varianta are fiecare ramura identificata printr-o conditie, iar ordinea dintre ele nu e relevanta, conditiile fiind exclusive intre ele in varianta a doua, ordinea de evaluare a conditiilor conteaza, rezultatul unei decizii dand informatie utila (notata intre paranteze) pentru deciziile urmatoare Astfel, o valoare falsa pentru conditia ж = 0 permite sa deducem ca ж > 0 doar datorita faptului ca pe aceasta ramura a fost stabilit deja ca ж > 0 Cu aceste observatii, functia poate fi transcrisa direct dupa cum urmeaza: int sgn(int x) { return x =0?x>0?l : 0 : -1; in acest caz, expresia de pe prima ramura (adevarat) a primei decizii este la randul ei o expresie conditionala Ramurile unei expresii conditionale complexe si gruparea lor pot fi identificate unic, si nu depind de punerea in pagina a codului: orice ? corespunde unui unic : dupa cum orice decizie are o ramura pentru adevarat si una pentru fals introducere Versiune preliminara 7 26 martie 2008 Programarea calculatoarelor Note de curs Marius Minea 1 11 Descompunerea in subprobleme in conceperea unui program (rezolvarea unei probleme), daca identificam o prelucrare (un calcul) care reprezinta rezolvarea unei subprobleme, putem defini o functie in acest scop Ea poate fi atunci apelata, cu parametrii corespunzatori, ori de cate ori calculul (subproblema) respectiva apare in solutie De exemplul, sa scriem o functie care calculeaza minimul a trei numere a, b si c, de exemplu, intregi Comparam intai doua dintre numere Daca a b, e suficient sa examinam ca si candidati pe b si c in ambele cazuri, am redus problema la una mai simpla: calculul minimului a doua numere Presupunand existenta unei functii min2 care face acest lucru, scriem pentru problema initiala: int min3(int a, int b, int c) { return a c ? a : min2(b, c); } intr-un program complet scriem functiile in ordinea: min2, med3ord, mediana, deoarece fiecare o foloseste pe cea dinainte introducere Versiune preliminara 8 26 martie 2008 Programarea calculatoarelor Note de curs Marius Minea 1 introducere in programarea in C 1 1 Functii in limbajul C Calcule si functii La origine, rolul programelor e de a efectua in principal calcule matematice Discutam de aceea structura programelor facand o paralela cu notiunile din matematica, care stau si la baza studiului teoretic al limbajelor de programare in matematica, notiunea de calcul e strans legata de cea de : in calcule se folosesc functii cunoscute, se definesc altele noi si se aplica intr-o anumit succesiune La fel, codul dintr-un fisier de program C e structurat in functii, care sunt definite similar celor din matematica O functie simpla Fie ca exemplu functia care ridica la patrat un numar intreg Matematic, scriem: sqr : Z —> Z sqr(x) = x   x Aceasta definitie de functie are doua parti: prima specifica numele functiei (sqr), domeniul sau de definitie (multimea numerelor intregi, Z) si domeniul sau de valori, de asemenea Z A doua parte specifica modul de calcul al valorii functiei sqr(x) pornind de la valoarea argumentului x in limbajul C, aceleiasi functii are aspectul urmator: int sqr(int x) return x * x; } Primul rand reprezinta , cu acelasi rol ca si declaratia sqr' : Z —> Z: el specifica numele functiei, domeniul de valori (cuvantul int dinaintea numelui functiei), si intre paranteze ( ) dupa numele functiei, numele parametrilor, precedati de tipul lor (un singur parametru x, tot intreg) Dupa antet urmeaza , scris intre acolade { } in interior, cuvantul return indica expresia care da valoarea functiei, folosind parametrul x; semnul * denota produsul O a doua functie Sa exprimam acum ridicarea la patrat a unui numar real Matematic, scriem: sqrf : R —> R sqrf(x) = x   x Functia sqrf e diferita de functia sqr scrisa anterior, deoarece are alt domeniu de definitie si de valori Strict vorbind, si operatia de inmultire e diferita, fiind definita pe alta multime, chiar daca se foloseste aceeasi notatie La fel si in limbajul C, nu putem folosi functia definita anterior pe intregi pentru a calcula patratul unui numar real, ci trebuie sa definim alta functie: float sqrf(float x) return x * x; } Cuvantul float folosit pentru domeniul de definitie si de valori indica numerele reale 1 2 Tipuri si operatori Tipuri in termeni de limbaje de programare, spunem ca int si float denota tipuri Un (sau tip de date) desemneaza o multime de valori impreuna cu o serie de operatii definite pe aceste valori Deci nu putem echivala pur si simplu tipul int cu multimea Z si tipul float cu R E mai apropiata o analogie intre un tip si o structura algebrica, care de asemenea grupeaza o multime-suport cu un set de operatii, de exemplu inelul (Z, +,  ) sau corpul (R, +,  )• Dupa cum inmultirea nu e inversabila pe Z dar este pe R, tot asa exista diferente intre unii operatori pentru intregi si reali in C O alta diferenta fata de matematica e ca in C, atat intregii si realii au domeniu de valori finit, si implicit realii au precizie finita Detalii, si ce implica aceasta in calcule vom discuta ulterior Constante intregi si reale Constantele numerice pot avea tip intreg, sau tip real, daca au punct zecimal Deci, 2 0 si 2 reprezinta acelasi numar, dar au tip diferit Se pot scrie constante si in format cu exponent de 10; acestea sunt de tip real, chiar daca nu au punct zecimal: 5e2 sau 5E3 sau 500 reprezinta acelasi numar Conform standardului, constantele nu au semn; -3 sau +4 1 sunt expresii cu operatori unari introducere Versiune preliminara 1 4 octombrie 2010 Programarea calculatoarelor Note de curs Marius Minea Operatori aritmetici Operatorii aritmetici + - * si   au aceeasi semnificatie, asociativitate si precedenta ca in matematica Parantezele ( ) se pot folosi pentru a grupa subexpresii in modul dorit De exemplu, pentru trebuie scris x   (m * n) si nu x   m * n, care inseamna —   n Doua aspecte difera intre intregi si reali: impartirea   se efectueaza exact pentru reali si cu rest pentru intregi, iar pentru operanzi intregi exista si operatorul % (modulo) care pentru reali nu e definit Astfel, 7 0 2 0 e 3 5, dar 7 2 da valoarea 3 ! La impartirea intregilor cu semn, rezultatul are valoarea absoluta ca si pentru intregi pozitivi, iar semnul e dat de regula uzuala, deci 7 -2 este -3 Semnul restului (obtinut cu operatorul %) e acelasi cu semnul deimpartitului Deci a b * b + a° "b e egal cu a (ecuatia impartirii cu rest) Exemple fara semn: 9 5 este 1 si 9° "5 este 4 , iar cu semn: 9 -5 este -1 si 9%-5 este 4 ; -9 5 este -1 si -9%5 este -4 ; -9 -5 este 1 si -9%-5 este -4 Operatorii + si - exista si ca operatori unari, cu precedenta mai mare decat operatorii aritmetici binari, deci putem scrie expresii de forma -x + -2 sau 2 * +3 Orice expresie are si ea un tip, determinat de tipul operanzilor si de operatori: astfel, operatiile aritmetice pe intregi produc intregi, iar cele intre reali produc reali; conversii de tip discutam ulterior 1 3 Terminologie: sintaxa si semantica Sintaxa Termenii folositi pentru a discuta structura programelor scrise in limbaje de programare sunt similari cu cei folositi pentru textele scrise in limbaj natural Lin program e alcatuit dintr-o insiruire de , cele mai mici unitati cu inteles de sine statator: • (int return): au inteles predefinit de standardul limbajului (nu se pot redefini) • : nume date de programator pentru functii, parametri, etc Un identificator e o secventa de caractere formata din litere mari si mici, liniuta de subliniere si cifre, care nu incepe cu o cifra si nu este un cuvant cheie Exemple: sqr, x, main, printf, N l, exit Numele functiilor de biblioteca sunt identificatori si ar putea fi refolosite in alt scop, dar generand confuzii • (intregi: -12, reali: 3 14, vom discuta ulterior caractere si siruri) • : operatori, separatori (ex , si ; ), parantezele ( ), acoladele { } etc in C se face distinctie intre litere mari si litere mici Astfel, nr, NR, Nr si nR sunt identificatori diferiti! Toate cuvintele cheie sunt scrise cu minuscule, ca si marea majoritate a functiilor standard intr-un program, elementele lexicale sunt grupate in constructii de limbaj mai complexe, dupa anumite reguli de scriere care formeaza limbajului Am dat ca exemplu definitia unei functii, formata din antet si corp Antetul unei functii (in general cu mai multi parametri) are sintaxa (forma): tip-rezultat nume-functie ( tip-parametru nume-parametru,       , tip-parametru nume-parametru ") Corpul unei functii contine intre acolade : elemente de limbaj ce specifica actiuni care sa fie executate de program Am folosit pana acum doar instructiunea return, cu sintaxa: return expresie ; intalnite pana acum sunt alcatuite sintactic dupa reguli similare celor din matematica Toti operatorii sunt indicati explicit: in loc de 2n sau n(n + 1) trebuie scris 2*n si respectiv n*(n+l) Semantica Sintaxa limbajului descrie doar felul cum arata un program (sau fragment) corect, dar nu si ce inseamna unui limbaj (de programare, a unei constructii de limbaj, sau a unui a program) descrie intelesul sau (efectul pe care il are) instructiunea return are ca efect (calculul valorii) expresiei precizate, care e returnata ca valoare a functiei pentru argumentele primite Executia functiei se incheie si se revine in locul de unde a fost apelata (aspecte detaliate ulterior) Punerea in pagina a programelor intr-un program C, (fara reprezentare pe ecran, cum ar fi caracterul spatiu, cel de tabulare, cel de linie noua) sunt intre elemente lexicale Ele nu sunt necesare decat daca prin omiterea lor, din doua elemente lexicale adiacente s-ar crea alt element lexical valid, schimband intelesul programului Astfel, functia sqr definita anterior poate fi scrisa int sqr(int x){return x*x;} prin eliminarea spatiilor Cele 3 spatii ramase sunt necesare ca separatori, in acest caz, intre cuvinte cheie si identificatori Spatiul nu e necesar in expresia 3+-5 dar e necesar in 3- -5 pentru ca — ar avea alt inteles (decrementate) in limbajul C Se recomanda respectarea unor conventii pentru mai buna lizibilitate a programelor C, ca de exemplu alinierea una sub alta a instructiunilor succesive, si (de regula cu doua spatii la dreapta) a unei secvente de instructiuni fata de acoladele { } inconjuratoare introducere Versiune preliminara 2 4 octombrie 2010 Programarea calculatoarelor Note de curs Marius Minea 1 4 Apelarea functiilor Definim o functie pentru calculul discriminantului unei ecuatii de gradul ii, a   x2 + b   x + c = 0 Ea are trei parametri, separati prin virgula, cu tipul indicat pentru fiecare parametru in parte float discrim(float a, float b, float c) return sqrf(b) - 4 * a * c; } Expresia apel in expresia de calcul a valorii functiei apare expresia sqrf (b) : e apelata (folosita) functia sqrf pentru a calcula patratul parametrului b Sintaxa este aceeasi ca si in matematica: functie ( expresie-argument , expresie-argument , , expresie-argument ") in care argumentele functiei, separate cu virgule, pot fi expresii arbitrare, spre deosebire de parametrii formati din antet, care sunt identificatori (nume) Tot spre deosebire de antet (dar ca in matematica), tipul fiecarui argument nu se indica explicit, dar trebuie sa corespunda cu tipul parametrului corespunzator, declarat in antet Astfel, am folosit functia sqrf, si nu sqr, deoarece argumentul b e real si nu intreg Expresiile date ca argument sunt evaluate pentru a determina valorile argumentelor pentru care se calculeaza valoarea functiei Ca argument al unui apel de functie putem folosi orice expresie, inclusiv continand un alt apel de functie De exemplu, pornind de la x(i = (ж   x2)2 definim: float pow6(float x) { return sqrf(x * sqrf(x)); } Declaratie si utilizare Functia discrim e corect definita daca se cunoaste semnificatia identificatorului sqrf, de exemplu daca definitia lui sqrf apare in textul programului inainte de definitia functiei discrim (la fel pentru pow6) in limbajul C, semnificatia oricarui identificator trebuie precizata inainte de a fi folosit (trebuie , aspect tratat in detaliu ulterior), tot dupa cum intr-un text matematic trebuie definit intai fiecare simbol folosit De exemplu, parametrii unei functii sunt declarati in antetul acesteia, unde li se precizeaza numele si tipul, si ei pot fi apoi folositi in corpul functiei 1 5 Programul principal Pana acum am definit si apelat doar functii individuale, fara a scrie un program complet Standardul C precizeaza ca intr-un (de exemplu sub un sistem de operare), la inceputul rularii programului se apeleaza functia numita in mod conventional main Aceasta functie are tipul int, si returneaza mediului de executie o valoare intreaga pe care acesta o poate testa pentru a determina daca rularea programului s-a incheiat cu succes Conventional, terminarea cu succes se semnaleaza prin returnarea valorii 0 (zero) din main Terminarea cu eroare se semnaleaza prin returnarea unei valori nenule, care poate reprezenta un cod ce identifica, prin diverse conventii, tipul sau cauza erorii Cu aceste precizari, cel mai simplu program C poate fi scris: int main(void) return 0; } Programul returneaza valoarea 0 (terminare cu succes), si atat Cuvantul cheie void reprezinta tipul vid (cu multime vida de valori) El indica faptul ca functia nu ia nici un parametru Vom vedea mai tarziu ca functia main poate sa aiba parametri transmisi de mediul de executie 1 6 Tiparirea de texte Un program are un efect observabil doar daca provoaca o schimbare (numita efect lateral) in starea mediului de executie (mediului inconjurator), de exemplu prin tiparirea (scrierea) unui mesaj in limbajul C, scrierea (si citirea) se face tot prin intermediul unor functii Functia printf permite tiparirea intr-o varietate de moduri, prin specificarea unui format sau tipar in cel mai simplu mod de folosire, apelul printf ("salut!") are ca efect tiparirea textului salut! pe ecran introducere Versiune preliminara 3 4 octombrie 2010 Programarea calculatoarelor Note de curs Marius Minea Din punct de vedere sintactic, acesta este un apel de functie, la fel ca si sqr (4), avand ca argument un sir de caractere in C, o constanta sir de caractere se scrie intre ghilimele " " care nu fac parte din sir ci au doar rolul de a-1 identifica si a-1 separa de restul textului sursa al programului Functia printf returneaza un int, si anume numarul de caractere scris, deci expresia printf ("salut!") are o valoare, 6 De regula nu ne intereseaza aceasta, ci doar efectul tiparirii Functii de biblioteca si fisiere antet Functia printf este o specificata de standardul limbajului C ei (precizand numele, tipul si parametrii) e data intr-un ( ) numit stdio h (de la standard input output) prezent in fiecare implementare conforma cu standardul Declaratia functiei printf, obligatorie inainte de folosire, se include cu secventa #include Aceasta este o , indicata prin caracterul # la inceput de linie, adresata , care prelucreaza sursa programului inainte de compilarea propriu-zisa Efectul e ca si cum continutul fisierului indicat, stdio h, (cu toate declaratiile) ar fi inclus la acel punct din program 1 7 Secventierea Putem scrie acum un prim program complet care are un efect vizibil si tipareste un text pe ecran: #include int main(void) printf("hello, world!"); return 0; } Acest program simplu evidentiaza un aspect fundamental in programare, si anume in general, corpul oricarei functii (in acest caz, main) contine o secventa de , fiecare reprezentand actiuni care trebuie efectuate de program instructiunile se executa secvential, in ordinea data de textul programului, mai putin exceptiile (discutate ulterior) care modifica explicit acest comportament De exemplu, instructiunea return incheie executia unei functii, indiferent daca in corpul functiei mai urmeaza alte instructiuni Programul dat are doua instructiuni, al caror efect este: - apelul la functia printf are ca efect afisarea textului hello, world! - se incheie executia functiei main (si a programului), returnand 0 (succes) mediului de executie instructiunea expresie Prima instructiune din programul dat e un apel de functie in general, o instructiune cu sintaxa expresie ; are ca efect evaluarea expresiei date in cazul de mai sus, esential e efectul lateral al evaluarii expresiei, si anume tiparirea; rezultatul expresiei (13, numarul de caractere scris de printf) e ignorat si nu e folosit mai departe Sunt corecte si instructiunile sqr(4); (daca e declarata functia sqr), sau 2 + 3; sau chiar 3 14; insa ele nu au nici un efect: expresiile sunt evaluate, dar nu sunt folosite si nici nu are loc tiparirea lor sau alt efect vizibil Semnul ; nu separa instructiunile, ci face parte integranta din ele (fiecare, inclusiv ultima, ar fi incorecta fara prezenta sa) Deci in limbajul C ; nu e separator, ci terminator de instructiuni 1 8 Comentarii Structura unui program C e supusa unor reguli stricte (sintaxa limbajului) Totusi, e utila anotarea textului programului cu explicatii in format liber Limbajul C admite doua feluri de : - incepand cu caracterele  * si terminat cu caracterele *  (se poate intinde pe oricate randuri) - incepand cu caracterele   si pana la urmatorul caracter de linie noua (neincluzandu-1 pe acesta) Secventele de inceput de comentariu nu sunt interpretate in interiorul constantelor sir de caractere (comentariile nu pot incepe intr-un sir) Compilatorul ignora continutul unui comentariu, si cauta doar sfarsitul sau in particular, intr-un comentariu nu se interpreteaza caracterele de inceput de comentariu, si deci nu se pot incadra comentarii in alte comentarii in secventa  * 1  * 2 *  3 *  comentariul se incheie la primul grup *  iar restul 3 *  nu e o secventa valida de cod C introducere Versiune preliminara 4 4 octombrie 2010 Programarea calculatoarelor Note de curs Marius Minea Comentariile sunt tratate ca si cum ar fi spatii albe, ele au efect de separator intre doua elemente lexicale succesive: int * un intreg * x e echivalent cu int x in practica dezvoltarii software, comentarea programelor e absolut necesara: - ajuta la intelegerea programului de catre altii, sau chiar de autor la o citire ulterioara - contin informatii despre evolutia programului: versiuni, revizii, adaugiri, erori corectate, etc - servesc ca baza pentru documentatie (uneori extrasa automat din comentarii cu structura precizata) Ca un minim, e bine ca programul saa fie documentat la nivelul functtiilor, precizand pentru fiecare: - succint, ce prelucrare efectueazaa tsi ce scop are - semnificatia fiecarui parametru, restrictii asupra valorilor valide, ti eventuale relatii intre parametri - specificatia functiei (relatia intre parametri ti valoarea returnata sau alte valori calculate de functie) - eventuale efecte laterale (citiri, scrieri, variabile globale modificate, alocare de memorie) - restrictii privind apelarea functiei (de exemplu inainte dupa alte apeluri sau prelucrari in program) 1 9 Tiparirea de numere Programe cu mai multe functii Un exemplu de program care definette ti apoi apeleaza o functie: #include int sqr (int x) { return x * x; }    definita inainte de folosire int main(void) printf("3 ori 2 la patrat e ");    tipareste un sir printf("%d", 3 * sqr(2));    tipareste un intreg: 12 return 0; Un program poate conttine succesiv mai multe definittii de functtii (tsi alte declarattii, vom vedea ulterior) Functia sqr e definita inainte de main, pentru a permite apelarea ei din programul principal Tiparirea unui intreg Pentru a tipari un intreg, functia printf e apelata cu doua argumente Mecanismul functiilor cu numar variabil de argumente va fi detaliat ulterior; in esenta, primul argument al lui printf este intotdeauna un sir de caractere, numit format sau tipar pentru ca examinandu-l, functia determina cate argumente urmeaza ti ce tip au Pentru a tipari o valoare intreaga (in baza 10), se da ca prim argument formatul (tirul de caractere) "%d" (de la decimal), ti ca al doilea argument expresia cu valoare intreaga care va fi tiparita Tipaarirea unui real Urmaatorul exemplu afitseazaa valoarea dataa de o functtie de bibliotecaa: #include    inclus pentru declaratia functiei asin #include int main(void) printf("Valoarea lui pi este 2 * asin(1): ");    text printf("%f", 2 * asin(1 0));    afiseaza valoarea 3 141593 return 0; Declaratia functiei asin (arcsinus) e data impreuna cu alte functii matematice in fitierul math h : double asin(double x); Parametrul ti rezultatul au tip real, dar cu precizie mai mare decat float, de unde ti numele (double precision) Tipul double (cuvant cheie al limbajului C) e folosit de majoritatea functiilor matematice de biblioteca, precizia de 6 zecimale a lui float fiind adesea insuficienta chiar pentru calcule uzuale Tipaarirea are loc similar ca tsi pentru iintregi, dar folosind ca parametru alt format pentru tipaarire, tirul "%f" (de la floating point) implicit, tiparirea pentru reali se face cu 6 cifre dupa punctul zecimal Distingem faptul ca prima aparitie a lui 2 * asin(1) in program e inauntrul unui sir de caractere: este un simplu text, afitat ca atare de printf A doua aparitie este o expresie, data ca argument la printf: ea este evaluata prin apelarea functiei asin, iar valoarea reala rezultanta e tiparita introducere Versiune preliminam 5 4 octombrie 2010 Programarea calculatoarelor Note de curs Marius Minea Conversii de tip Expresia 2 * asin(l O) innmulteste un intreg cu un real La operatii intre cele doua tipuri, intregul e convertit automat (implicit) la real si pentru argumente de functie are loc conversia valorii la tipul corespunzator parametrului, daca acesta e cunoscut din declaratia functiei; puteam scrie astfel si asin(l) La printf, formatul ° "f poate fi folosit si pentru float, acesta fiind convertit automat la double Alte reguli de conversie vor fi discutate si sistematizate ulterior Sfarsitul de linie Pentru apeluri succesive la functia printf, afisarea se continua de unde s-a oprit Trecerea la linie noua trebuie specificata explicit O linie noua intr-un text e marcata de un caracter special (de control), a carui tiparire are ca efect avansul cu o linie Deoarece acest caracter nu are aspect vizibil, ci e caracterizat prin efectul sau, el nu poate fi reprezentat direct in program (o constanta sir de caractere intre " " nu poate fi despartita pe mai multe linii) Conventional, caracterul de linie noua se reprezinta in program (de exemplu intr-un sir de caractere) prin secventa de doua caractere  n (de la newline) Secventa printf ("unu n"); printf ("doi"); tipareste cele doua cuvinte pe linii succesive, si cursorul ramane pe linia a doua Acelasi efect il are printf ("unu ndoi"); Caracterul   folosit pentru introducerea secventelor speciale  n si altele trebuie dublat pentru a se reprezenta pe sine insusi intr-un sir Astfel, printf("c:  windows") va tipari c: windows Formatul general in printf Se poate combina in acelasi apel la printf scrierea valorilor de mai multe tipuri Functia printf parcurge sirul de caractere dat ca prim argument, si tipareste fiecare caracter reprezentat direct sau prin secvente speciale ( n) La intalnirea unui , cum ar fi ° "d sau %f, printf ia urmatorul argument din cele primite si il tipareste corespunzator cu formatul indicat (intreg, real, etc ) De exemplu, printf ("un intreg ° "d si un real ° "f n", 2, 3 14) va tipari: un intreg 2 si un real 3 140000 trecand apoi la linie noua Pentru a fi interpretat ca un caracter obisnuit, caracterul % trebuie dublat in sirul de format: printf ("7%%") va tipari 7% Corespondenta intre numarul de specificatori de format cu % din sirul dat ca prim argument si numarul argumentelor urmatoare, precum si corespondenta tipurilor indicate cu cele ale argumentelor trebuie verificata de programator in caz de nepotrivire, comportamentul functiei printf e nedefinit! 1 10 Operatorul conditional Expresia conditionala Adeseori, valoarea unei functii e calculata cu formule (expresii) diferite pe fragmente ale domeniului de definitie Un exemplu e functia valoare absoluta pentru numere intregi: abs : Z —> Z abs(x) = = 0 ? x : -x; }    operator minus unar in definitia cu acolada am scris conditia "altfel" in loc de x , = (mai mic, mai mare, mai mic sau egal, mai mare sau egal) == (egal) si != (diferit) Alti operatori, precum si detalii despre notiunea de valoare logica in C vor fi discutati ulterior Ca exemple, putem defini similar cu functia valoare absoluta functii simple pentru maxim si minim: int max(int a, int b) { return a > b ? a : b; } int min(int a, int b) { return a {—1, 0,1} sgn(x) = 0 Scrisa astfel, functia nu poate fi tradusa direct in program, deoarece operatorul conditional permite doar o decizie binara, pe doua cazuri, si nu pe mai multe Putem reformula problema in felul urmator: efectuam o prima decizie (test), de exemplu dupa conditia x 0) daca x = 0 0 altfel (ж >0) 1 Asa cum a fost rescrisa, functia necesita in continuare mai multe decizii pentru stabilirea valorii, dar fiecare din ele e o decizie logica cu doua ramuri, dupa valoarea de adevar a unei conditii, si poate fi exprimata in C cu operatorul conditional O alta diferenta e ca prima varianta are fiecare ramura identificata printr-o conditie, iar ordinea dintre ele nu e relevanta, conditiile fiind exclusive intre ele in varianta a doua, ordinea de evaluare a conditiilor conteaza, rezultatul unei decizii dand informatie utila (notata intre paranteze) pentru deciziile urmatoare Astfel, o valoare falsa pentru conditia ж = 0 permite sa deducem ca ж > 0 doar datorita faptului ca pe aceasta ramura a fost stabilit deja ca ж > 0 Cu aceste observatii, functia poate fi transcrisa direct dupa cum urmeaza: int sgn(int x) return x =0?x>0?l : 0 : -1; in acest caz, expresia de pe prima ramura (adevarat) a primei decizii este la randul ei o expresie conditionala Ramurile unei expresii conditionale complexe si gruparea lor pot fi identificate unic, si nu depind de punerea in pagina a codului: orice ? corespunde unui unic : dupa cum orice decizie are o ramura pentru adevarat si una pentru fals introducere Versiune preliminara 7 4 octombrie 2010 Programarea calculatoarelor Note de curs Marius Minea 1 11 Descompunerea in subprobleme in conceperea unui program (rezolvarea unei probleme), daca identificam o prelucrare (un calcul) care reprezinta rezolvarea unei subprobleme, putem defini o functie in acest scop Ea poate fi atunci apelata, cu parametrii corespunzatori, ori de cate ori calculul (subproblema) respectiva apare in solutie De exemplul, sa scriem o functie care calculeaza minimul a trei numere a, b si c, de exemplu, intregi Comparam intai doua dintre numere Daca a b, e suficient sa examinam ca si candidati pe b si c in ambele cazuri, am redus problema la una mai simpla: calculul minimului a doua numere Presupunand existenta unei functii min2 care face acest lucru, scriem pentru problema initiala: int min3(int a, int b, int c) { return a c ? a : min2(b, c); } intr-un program complet scriem functiile in ordinea: min2, med3ord, mediana, deoarece fiecare o foloseste pe cea dinainte introducere Versiune preliminara 8 4 octombrie 2010 Programarea calculatoarelor Note de curs Marius Minea 2 Recursivitate in limbajele de programare, recursivitatea e un concept, fundamental care le extinde in mod esential puterea de exprimare (expresivitatea): recursivitatea permite scrierea unor programe care nu s-ar putea exprima doar cu notiunile fundamentale de secventiere si decizie prezentate pana acum Pentru rezolvarea practica a problemelor, recursivitatea e foarte importanta deoarece permite sa descriem solutia unei probleme complexe folosind una sau mai multe probleme de acelasi tip, dar mai simple Ea e astfel strans legata de principiul descompunerii in subprobleme (divide et impera) in proiectarea solutiilor 2 1 Definitie si exemple O notiune e daca e folosita in propria sa definitie Cunoastem un exemplu din matematica: , unde un termen al sirului e definit printr-o relatie in raport cu termenii anteriori Exemple sunt: • progresia aritmetica:  o = a, xn =  ra i + p pentru n > 0 • progresia geometrica:  o = b, xn = q   xn-i pentru n > 0 Acestea sunt recurente de ordinul i, in care termenul definit depinde doar de termenul imediat anterior Alte recurente mai complexe sunt: • sirul lui Fibonacci: F, = Fi = 1, Fn = Fn i + Fn-2 pentru n > 2 (un sir recurent de ordinul ii) • coeficientii binomiali: = С" = 1 pentru n > 0, C(t = Ct) i + pentru 0 0) Avand o definitie pe doua variante, functia se poate exprima in C cu operatorul conditional ca si exemplele nerecursive dinainte: float pwr(float x, unsigned n) return n==0 ? 1 : x * pwr(x, n-1); Pentru exponent am folosit tipul unsigned (cuvant cheie in limbajul C), corespunzand numerelor naturale (nenegative); orice valoare diferita de zero e deci pozitiva si tratata corect pe ramura "altfel" Pentru scrierea functiei pwr nu au fost necesare facilitati noi de limbaj Esential e doar ca limbajul sa permita ca in corpul unei functii sa fie apelata chiar aceasta functie (stim ca e permisa apelarea unei functii care e deja declarata) in limbajul C, dupa ce a fost scris antetul functiei ca parte a definitiei ei complete se cunosc deja numele functiei, tipul si parametrii ei Antetul reprezinta deci o declaratie a functiei (chiar inainte de a fi fost scris corpul ei), ceea ce e suficient pentru a permite apelul recursiv 2 3 Mecanismul apelului de functie Apelul recursiv Desi scrierea acestui prim exemplu recursiv nu a necesitat elemente noi de limbaj, pentru a intelege corect recursivitatea sunt necesare mai multe detalii despre mecanismul apelului de functie incepem cu Recursivitate Versiune preliminara 1 9 martie 2008 Programarea calculatoarelor Note de curs Marius Minea un exemplu nerecursiv: functia int sqr(int x) { return x * x; } si expresia sqr (3 * sqr(2)) Expresia e un apel la functia sqr inainte de apel, trebuie evaluat argumentul functiei, pentru a cunoaste valoarea a carei patrat trebuie calculat Argumentul e un produs in care unul din factori e el insusi o expresie apel de functie Ca atare, din intreaga expresie sa evalueaza intai sqr (2), apoi se inmulteste 3 cu rezultatul (4), iar cu valoarea 12 se efectueaza al doilea apel la sqr, cu rezultatul 144 Desi apelul exterior la sqr contine o subexpresie cu un apel la aceeasi functie, expresia nu are caracter recursiv, intrucat valoarea functiei sqr e calculata direct din valoarea parametrului transmis Valoarea lui sqr (2) e necesara ca argument pentru apel, nu in corpul functiei ca in definitiile recursive Apelul de functie Evaluarea unui apel de functie e declansata atunci cand valoarea functiei e necesara in evaluarea unei expresii (inclusiv in executia unei instructiuni de forma expresie ; ) Ea se efectueaza in urmatorii pasi: • se evalueaza toate expresiile care constituie argumentele functiei Deci orice apeluri de functii care apar in argumente se efectueaza inainte de apelul functiei considerate • valorile argumentelor se atribuie parametrilor formali din antetul functiei, cu conversiile necesare de tip (de exemplu intreg-real) Compatibilitatea de tip a expresiilor argument e verificata deja la compilare, daca se cunosc tipurile parametrilor formali - un motiv in plus pentru care se cere ca o functie sa fie declarata inainte de a fi folosita • se executa corpul functiei, cu parametrii formali avand valori initiale ca mai sus La intalnirea instructiunii return, executia functiei se incheie cu valoarea obtinuta prin evaluarea expresiei date • executia programului revine la locul de apel, unde valoarea returnata de functie e folosita Transmiterea parametrilor in limbajul C transmiterea parametrilor la functii se face : in momentul apelului, parametrii formali iau valoarea argumentelor (care au fost evaluate); ei nu sunt substituiti cu expresiile argumentelor in consecinta, pentru apelul discutat mai sus, in instructiunea return se va evalua expresia 12 * 12 si nu (3 * sqr (2)) * (3 * sqr (2)) Altfel spus, o functie lucreaza cu valori (numerice), nu cu expresii simbolice Expresia 3 * sqr(2) se evalueaza o singura data, inainte de apelul sintactic exterior la sqr, si deci apelul la sqr(2) e deja incheiat in momentul celui de-al doilea apel, sqr (12) Acest fapt poate fi vizualizat augmentand functia sqr cu o instructiune de tiparire, o practica utila in urmarirea executiei programelor #include int sqr(int x) { printf("calculam patratul lui %d n", x); return x * x; } int main(void) { printf("sqr(3 * sqr(2)) = %d n", sqr(3*sqr(2))); return 0; } calculam patratul lui 2 calculam patratul lui 12 sqr(3 * sqr(2)) = 144 Rezultatul rularii programului (prezentat sub textul sursa) evidentiaza si pentru apelul printf din main evaluarea argumentelor inainte de inceperea executiei functiei Evaluarea argumentului al doilea produce cele doua linii tiparite in apelurile la sqr Aceste linii apar inainte de a scrie chiar si portiunea de text obisnuit din primul argument (formatul), scriere in care consta tocmai executia functiei printf Returnarea valorilor La intalnirea instructiunii return, executia functiei se incheie; orice alte instructiuni care urmeaza in corpul functiei nu se mai executa Daca executia unei functii se termina prin atingerea ultimei acolade } fara a executa o instructiune return, iar programul utilizeaza valoarea functiei, efectul e (programul se comporta imprevizibil) O functie care nu prevede in orice situatie o valoare returnata e scrisa eronat sub aspect logic Recursivitate Versiune preliminara 2 9 martie 2008 Programarea calculatoarelor Note de curs Marius Minea Apelul recursiv Discutam mecanismul apelului recursiv luand ca exemplu calculul lui 53 cu functia putere in apelul pwr(5, 3) (x = 5,n = 3), expresia conditionala conduce la evaluarea lui 5 * pwr(5, 2) Aceasta necesita un nou apel la pwr, de data aceasta cu parametrii x = 5, n = 2 Procesul se repeta cu pwr(5, 1) pana la apelul pwr(5, 0) pentru care valoarea se calculeaza direct: 1 Din acest apel se revine in locul unde a fost facut: la evaluarea lui 5 * pwr(5, 0) care poate fi efectuata acum Apelul pwr(5, 1) returneaza valoarea 5 folosita in calculul 5 * pwr(5, 1) pentru valoarea lui pwr(5, 2) in final, aceasta valoare, 25, e folosita in expresia 5 * pwr(5, 2) pentru a calcula valoarea 125 a lui pwr(5, 3) pwr(5, 3) apel# "125 5* pwr(5, 2) apel# "25 5* pwr(5, 1) apel# "5 5* pwr(5, 0) apel# "1 1 Urmarind secventa de apel, rezulta ca la un moment dat pot fi in executie mai multe apeluri diferite la aceeati functie Fiecare apel reprezinta o instanta (copie) distincta a functiei, cu propriile valori de parametri, cele primite in momentul apelului La fel se intampla ti in calculul pe hartie, prin "desfaturarea" formulei de recurenta: pentru a calcula pe 53, inlocuim pe x cu 5 ti n cu 3 Trebuie calculat 52: aplicand din nou formula inlocuim pe n cu 2; aceasta nu afecteaza insa instanta initiala a problemei (53), unde n este in continuare 3 in exemplul dat, recursivitatea are o structura liniara: fiecare apel recursiv genereaza un singur nou apel, panaa la oprirea pentru cazul de bazaa in acel moment, sunt active toate apelurile, in numaar de 4 Revenirea se face in ordine inversa fata de cea de apel: din fiecare apel se revine in instanta care a efectuat apelul Aici, executtia se reia in contextul dinainte de apel: adicaa din locul in care a fost facut apelul (unde e folosita valoarea returnata), ti cu acele valori ale parametrilor corespunzand instanttei respective Ca la orice apel de functie, informatia se transmite spre functia apelata prin parametri, ti inapoi spre locul de apel (functia apelanta) prin rezultat in exemplul dat se creeaza un lant in care la revenire, valoarea returnataa de fiecare apel e folositaa in instantta apelantaa pentru calculul propriului rezultat, care e transmis mai departe inapoi spre locul de apel Rezultatul final e astfel efectul unui calcul in care cate un pas e efectuat de fiecare din instanttele apelate Aceasta e esenta recursivitaattii tsi in acelatsi timp puterea ei in rezolvarea de probleme: ea permite exprimarea indirecta a solutiei prin pati simpli, din aproape in aproape, faaraa a necesita formularea directaa a unei soluttii complexe 2 4 Elementele unei definitii recursive intr-o definitie recursiva corecta se pot identifica urmatoarele componente: • Cazul de baza Trateaza situatiile (cele mai simple) in care notiunea recursiva e definita direct Exemple: pentru tirurile recurente, primul termen (sau mai multi, la recurentele de ordin > 1) pentru liste, cea vidaa, sau cea cu un element; pentru expresii, constantele tsi identificatorii • Relatia de recurenta: partea propriu-zis recursiva a definifiei, in care notiunea definita apare si in corpul definittiei Exemple: formulele de recurenttaa pentru tsiruri; ramura de definittie "o listaa e un element urmat de o listaa"; pentru expresii, variantele de definittie cu expresie iintre paranteze, apel de functii (cu parametri expresii) ti cele cu expresii compuse cu operatori • Terminarea recursivitatii Daca definifia urmeaza o ramura care confine din nou notiunea definita, atunci definitia trebuie aplicata din nou O notiune e corect definita recursiv daca acest proces se opreste iintotdeauna Pentru a fi riguroasaa, o definitie recursivaa trebuie iinsotitaa de o demonstratie caa aplicarea definitiei se opreste dupaa un numaar finit de pasi Rezulta ca o definitie recursiva nu poate fi corecta fara un caz de baza, pentru ca nu se ajunge niciodataa la un punct unde notiunea poate fi definitaa direct Cazul de bazaa si relatia recursivaa sunt de fapt alternative ale aceleiasi definitii Acest lucru e explicit iin regulile sintactice care definesc Recursivitate Versiune preliminara 3 9 martie 2008 Programarea calculatoarelor Note de curs Marius Minea constructii de limbaj cum ar fi expresiile Pentru un sir recurent putem evidentia aceasta folosind , , (a n = 0 , acolade: xn = 0 Pentru terminarea recursivitatii, cel mai uzual argument foloseste o masura (cantitate) care descreste la fiecare aplicare a definitiei, pana atinge o valoare pentru care definitia e data direct La siruri recurente, aceasta cantitate e chiar indicele n al termenului general xn 2 5 Alte exemple de recursivitate insiruirea vazuta recursiv si notiuni din afara matematicii, uneori foarte simple, se preteaza la definitii recursive Un tipar de definitii des intalnit se bazeaza pe faptul ca iteratia (repetitia) poate fi definita prin recursivitate Putem defini astfel: Un sir (secventa, lista) e fie un element, fie un sir urmat de un element Uneori e util sa includem in definitie sirul vid, care apare natural in diverse operatii (ca element neutru la concatenare; la initializarea sau dupa stergerea tuturor elementelor unei liste, etc ): Un sir e fie un sir vid, fie un element urmat de un sir Cele doua variante difera atat prin cazul de baza (un element sau zero), cat si prin pozitia notiunii recursive in definitie: prima varianta e (notiunea definita recursiv "sir" e pruna in rescrierea "sir urmat de un element"), iar a doua e , deoarece in expandarea "element urmat de un sir" notiunea definita "sir" e pe ultima pozitie Tiparele de recursivitate la stanga si la dreapta conduc la prelucrari diferite in program, pe care le studiem si comparam in continuare Recursivitatea in sintaxa limbajelor Recursivitatea, apare natural in definirea precisa a sintaxei limbajelor de programare Multe elemente de limbaj au in componenta repetitia, care poate fi exprimata recursiv Astfel, antetul unei functii poate fi definit (in limita celor prezentate pana acum) ca: antet-functie ::= tip identificator ( parametri ) parametri ::= void | lista-param lista-param ::= tip identificator | tip identificator , lista-param Am folosit conventional simbolurile ::= pentru definitie si | pentru alternativa Acest mod de a descrie regulile sintactice, adica unui limbaj se numeste forma Backus-Naur (BNF) Putem defini recursiv si alta notiune fundamentala, expresia Din cele prezentate pana acum: expresie ::= constanta | identificator | identificator ( argumente ) | - expresie | expresie operator-binar expresie | expresie ? expresie : expresie | ( expresie ) argumente ::=e | lista-argumente lista-argumente ::= expresie | expresie , lista-argumente unde e denota conventional alternativa cu continut vid (fara simboluri de limbaj), aici pentru apeluri de forma functie(), fara argumente intre paranteze Pentru o definitie riguroasa, trebuie sa precizam ca orice constructie de limbaj trebuie definita printr-un numar finit de aplicari ale regulilor Astfel, -(2 + 3) e o expresie: se pot aplica pe rand regulile expresie ::= - expresie, expresie ::= ( expresie ), expresie ::= expresie + expresie, si de doua ori regula expresie ::= constanta 2 6 Doua tipare de calcul recursiv Sa examinam un alt exemplu tipic de calcul recursiv, factorialul Avem: n  = 0) unsigned fact(unsigned n) { return n == 0 ? 1 : fact(n-l) * n; } Recursivitate Versiune preliminara 4 9 martie 2008 Programarea calculatoarelor Note de curs Marius Minea in evaluarea lui fact pentru n > 0, ultima operatie efectuata e inmultirea cu n, restul calculelor fiind efectuate anterior in apelul fact(n-l) (si celelalte apeluri recursive care rezulta din acesta) Ordinea calculelelor e indicata de paranteze in expresia n! = ((((1   1)   2)   )   (n — 1))   n Expandand definitia pentru n > 2, obtinem n! = ((n — 2)!   (n — 1))   n inainte de evaluarea lui fact(n-2) stim ca in produs apar atat n cat ti n-1, dar asa cum e scrisa functia, nu se efectueaza direct inmultirea lor: se apeleaza intai fact(n-2), rezultatul e inmultit cu n-1 in cadrul apelului fact(n-l), ti doar in final se face inmultirea cu n, pentru rezultatul lui fact(n) Pornind de la aceasta observatie, rescriem factorialul folosind asociativitatea inmultirii, pentru a efectua cat mai multe inmultiri indata ce factorii devin disponibili: n! = 1   (2   (   ((n — 1)   n))) Transcriind in C, am dori sa efectuam inmultirea (n-1)*n in cadrul apelului functiei pentru n-1, inainte de a calcula recursiv factorialul pentru n-2 in apelul pentru n-2 s-ar putea inmulti apoi rezultatul lui (n-1)*n cu n-2, etc Pentru aceasta, avem nevoie sa transmitem la fiecare apel recursiv pe langa valoarea lui n ti rezultatul inmultirilor deja efectuate Obtinem astfel: unsigned fact r(unsigned n, unsigned r) return n == 0 ? r : fact r(n-1, r * n); Parametrul r reprezinta rezultatul partial calculat La fiecare apel recursiv, el e inmultit cu valoarea curenta a lui n ti rezultatul e transmis mai departe la apelul pentru n-1 Cand n==0, toate inmultirile au fost deja efectuate; rezultatul se gasette acumulat in r ti poate fi returnat Pe ramura recursiva, la revenire nu se mai efectueazaa nici un calcul: valoarea provenitaa din apelul recursiv e deja rezultatul complet tsi e returnataa direct mai departe la apelant Din primul apel pentru n, rezultatul partial pe care dorim sa-l transmitem spre apelul pentru n-1 e tot valoarea n Deci, initial, r ar trebui sa fie 1, ti pentru a calcula pe n! vom apela fact r(n, 1) De fapt, am definit o functie mai generala: fact r(n, r) calculeaza pe r   n! Pentru a nu complica utilizatorul cu parametrul suplimentar datorat modului de calcul, definim functia fact2 cu un singur parametru Aceasta doar "impacheteaza" apelul initial fact r(n, 1) : unsigned fact2(unsigned n) { return fact r(n, 1); } Pentru factorial, putem alege iintre cele douaa variante de scriere iin prima, calculul se face la revenirea din apelurile recursive, iar acestea nu au nevoie de vreun rezultat partial calculat anterior in a doua, rezultatul partial e transmis in adincime ca parametru, ti actualizat inainte de fiecare apel recursiv Existaa insaa situattii in care parte din prelucrare se efectueazaa inainte de fiecare apel recursiv, fiind necesar sa transmitem "in jos" la inaintarea in recursivitate valori ce vor fi folosite ulterior in calcule inversarea cifrelor unui numar Scriem o functie care ia un intreg fara semn ti-l transforma in numarul cu aceleati cifre zecimale dar in ordine inversa Scriem solutia pornind de la un exemplu: 1472 Ultima cifra, 2, devine prima cifra a rezultatului Punem ultima cifra ramasa din 147 dupa 2, obtinand 27 = 2   10 + 7 Din numarul ramas, 14, plasam ultima cifra dupa 27, obtinand 27   10 + 4 = 274, etc in cuvinte, pasul recursiv de prelucrare poate fi exprimat: rezultatul inversaarii, dacaa a mai raamas de inversat n, iar din inversarea ultimelor cifre s-a obtinut deja v, e acelasi cu rezultatul inversarii lui n 10, cu valoarea intermediara 10   v + n mod 10 Deti enuntul problemei are un singur parametru, soluttia recursivaa obttinutaa manipuleazaa douaa cantitaatti, deci scriem o functie recursivaa cu doi parametri Prelucrarea se oprette cand n e 0 (nu mai sunt cifre de inversat), iar initial, v e valoarea fara nici o cifra, deci tot 0; astfel, pentru prima cifra c, expresia 10   0 + c da valoarea dorita c #include unsigned revnum r(unsigned n, unsigned r) return n == 0 ? r : revnum r(n   10, 10 * r + n % 10); unsigned revnum(unsigned n) { return revnum r(n, 0); } Recursivitate Versiune preliminara 5 9 martie 2008 Programarea calculatoarelor Note de curs Marius Minea int main(void) printf("%u n", revnum(1472));    %u pt tiparire unsigned return 0; Dorim ca solutie o functie cu un singur parametru, ca in enunt, pentru a nu complica utilizatorul cu un parametru suplimentar pentru valoarea intermediara Am scris astfel functia revnum care apeleaza functia recursiva revnum r(n, 0) cu valoarea initiala necesara pentru al doilea parametru Cel mai mare divizor comun Algoritmul lui Euclid pentru calculul celui mai mare divizor comun a doi intregi pozitivi e un exemplu clasic de algoritm exprimat recursiv, in care o problema e rezolvata prin reducerea la o instanta mai simpla a aceleiati probleme Exprimat informal, algoritmul e: - daca numerele sunt egale, rezultatul e chiar valoarea lor comuna - altfel, se scade cel mai mic numar din cel mai mare, ti se repeta procedura cu noile numere Exprimarea din urma ("se repeta procedura") indica abordarea recursiva: solutia se obtine re-zolvand aceeati problema pentru valori noi ale numerelor (mai mici, deci dupa un numar finit de pati se ajunge la cazul de baza) Putem scrie deci pe cazuri: {a a = b cmmdc (a — b,b) a>b cmmdc(a, b — a) altfel (a b ? cmmdc(a - b, b) : cmmdc(a, b - a); Detsi am transpus direct din cuvinte iin formula recursivaa tsi apoi iin cod, a raamas netratat un aspect: enuntul initial e dat pentru numere pozitive, iar tipul unsigned permite ti valoarea 0 Apelarea (chiar si accidentalaa) a functiei scrise mai sus cu un parametru nul va duce la o secventaa infinitaa de apeluri recursive, deoarece scazand 0 celalalt numar nu se modifica si reluam acelasi apel (pana cand, in functie de mediul de rulare, programul se va termina probabil forttat epuiziand resursele de memorie) Este important ca functiile pe care le scriem sa fie robuste ti sa nu produca erori neprevazute ti catastrofale Ca atare, rescriem functia tinand cont ca 0 e divizibil cu orice numar, si deci cmmdc(a, 0) = cmmdc(0, a) = a: unsigned cmmdc(unsigned a, unsigned b) return b == 0 ? a : a == 0 ? : b : a > b ? cmmdc(a - b, b) : cmmdc(a, b - a); in aceasta scriere, cazul a = b = 0 va intra pe ultima ramura, avand ca efect apelul cmmdc(a, 0) care va returna a 2 7 Calculul recursiv al seriilor Calculul sumei partiale a unei serii se preteaza natural la o exprimare recursiva Notand cu in termenul general, si sn = ZX=o tk, obtinem imediat pentru termenul general: sn = sn 1 + in (pentru n > 1), si deci: Recursivitate Versiune preliminara 6 9 martie 2008 Programarea calculatoarelor Note de curs Marius Minea sn — t0 n — 0 sn-1 + in altfel (n > 0) Avand o formula de calcul direct pentru termenul in al seriei (exprimat deci ca functie de n), putem transforma direct formula de mai sus intr-o functie recursiva s, care apeleaza pentru calcul functia t Pentru exemplul simplu in — 1 n (pentru n > 1, iar to — 0), putem scrie urmatorul program: #include #include double t(unsigned n) return n == 0 ? 0 : 1 0 n; double s(unsigned n) return n == 0 ? t(0) : s(n-1) + t(n); int main(void) printf("Constanta lui Euler e aprox %f n", s(1000)-log(1000)); return 0; Din matematica, stim ca seria s2n 1 n are suma infinita, ti creste aproximativ ca logaritmul natural al lui n Mai precis, lim (X 1 k - lnn) — 7'0 5772 k=1 Aproximatia calculata de program are primele 3 zecimale exacte Tipul double folosit in program reprezinta, ca si float, numere reale, dar cu precizie mai buna (de aici si numele), si e recomandabil in calcule pentru a micsora acumularea erorilor de rotunjire Este tipul standard folosit de functiile matematice de biblioteca declarate in math h (cum e si functia log pentru logaritmul natural) si tipul implicit pentru constantele reale Scrierea 1 0 n pentru termenul matematic 1 n e necesara pentru a obtine impartire reala: operandul 1 0 fiind real, va fi convertit implicit si intregul n Altfel, 1 n ar fi insemnat impartire intreaga, cu rest, si valoarea 0 pentru n > 1 Desigur ca in acest program s-ar fi putut scrie mai simplu, direct double s(unsigned n) return n == 0 ? 0 : s(n-1) + 1 0 n; Varianta prezentata evidensiaza insa forma generala a funcsiei s pentru suma exprimata recursiv, ssi programul poate fi adaptat la altaa serie prin simpla inlocuire a funcstiei t E valabilaa aceeassi observastie generalaa faacutaa intai pe exemplul factorialului despre cele douaa variante de definire a calculului recursiv Funcsia s asa cum a fost scrisa mai sus efectueaza calculul la revenirea din recursivitate, ultima adunare fiind sn 1 + in, corespunzand unei grupari a calculelor de forma: sn — (((t0 + t1) + t2) + ) + in Cealalta alternativa o constituie transmiterea ca parametru suplimentar a unui rezultat parsial deja calculat (inisial zero), la care se acumuleaza in fiecare pas termenul curent Astfel, termenii sunt adunasi efectiv in ordine inversa: sn — t0 + (t1 + ( + (in 1 + in))) Funcsia se scrie: double s2(unsigned n, double res) return n == 0 ? res + t(0) : s2(n-1, res + t(n)); Recursivitate Versiune preliminara 7 9 martie 2008 Programarea calculatoarelor Note de curs Marius Minea Pentru a pastra aceeasi forma naturala cu un singur parametru a functiei folosite direct de programator, redefinim functia s, apeiand pe s2 cu valoarea 0 ca suma partiala deja calculata: double s(unsigned n) { return s2(n, 0); } Acest cod poate inlocui functia s din programul anterior, pastrand functia t 2 8 Calculul de aproximari cu o precizie data Majoritatea exemplelor de calcul numeric recursiv date pana acum au aceeati structura: se calculeaza termenul unui tsir definit printr-o relattie de recurenttaa, iar numaarul de apeluri recursive necesar pentru calcul e determinat de ordinul (indicele) termenului, dat la primul apel in matematica ti practica intalnim insa adesea cazuri se dorette un calcul efectuat cu o anumita precizie: se genereazaa o secventtaa de aproximaari, iar caand aproximarea curentaa a atins precizia doritaa, calculul se opretste Dam ca exemplu o metoda de aproximatie pentru calculul radacinii patrate y x O cunoscuta formula matematica (poate fi derivata din metoda lui Newton, dar in acest caz particular e mult mai veche) ne da secventa de aproximari: ara+1 = (an + x an) 2, cu o aproximare initiala arbitrara (de exemplu a0 = 1) Elementul cheie al solutiei e formularea problemei pentru a-i identifica si scoate in evidenta caracterul recursiv: mai precis, parametrii si cazul de baza (oprirea din secventa de apeluri recursive) Conditia de oprire nu e data de aproximarea initiala, ci, dupa cum rezulta din enunt, de precizia atinsa Pentru a o calcula, avem nevoie de aproximarea curenta, care devine astfel parametru al problemei (cu valoarea initiala data de a0) Putem atunci enunta solutia iin cuvinte iin felul urmaator: functia caautataa returneazaa o aproximatie de precizie data a lui д x, stiind o aproximare curenta an data de asemenea ca parametru Daca aproximatia curentaa e suficient de bunaa, poate fi returnataa (cazul de bazaa) Dacaa nu, rezultatul va fi dat de aceeasi functie de calcul, apelataa tot pentru x, dar cu o aproximatie curentaa mai bunaa, dataa de an+1 #include #include double rad(double x, double a) return fabs(a - x a) 0 • progresia geometrica:  o = b, xn = q   xn-i pentru n > 0 Acestea sunt recurente de ordinul i, in care termenul definit depinde doar de termenul imediat anterior Alte recurente mai complexe sunt: • sirul lui Fibonacci: F, = Fi = 1, Fn = Fn i + Fn-2 pentru n > 2 (un sir recurent de ordinul ii) • coeficientii binomiali: = С" = 1 pentru n > 0, C(t = Ct) i + pentru 0 0) Avand o definitie pe doua variante, functia se poate exprima in C cu operatorul conditional ca si exemplele nerecursive dinainte: float pwr(float x, unsigned n) return n==0 ? 1 : x * pwr(x, n-1); Pentru exponent am folosit tipul unsigned (cuvant cheie in limbajul C), corespunzand numerelor naturale (nenegative); orice valoare diferita de zero e deci pozitiva si tratata corect pe ramura "altfel" Pentru scrierea functiei pwr nu au fost necesare facilitati noi de limbaj Esential e doar ca limbajul sa permita ca in corpul unei functii sa fie apelata chiar aceasta functie (stim ca e permisa apelarea unei functii care e deja declarata) in limbajul C, dupa ce a fost scris antetul functiei ca parte a definitiei ei complete se cunosc deja numele functiei, tipul si parametrii ei Antetul reprezinta deci o declaratie a functiei (chiar inainte de a fi fost scris corpul ei), ceea ce e suficient pentru a permite apelul recursiv 2 3 Mecanismul apelului de functie Apelul recursiv Desi scrierea acestui prim exemplu recursiv nu a necesitat elemente noi de limbaj, pentru a intelege corect recursivitatea sunt necesare mai multe detalii despre mecanismul apelului de functie incepem cu Recursivitate Versiune preliminara 1 12 martie 2009 Programarea calculatoarelor Note de curs Marius Minea un exemplu nerecursiv: functia int sqr(int x) { return x * x; } si expresia sqr (3 * sqr(2)) Expresia e un apel la functia sqr inainte de apel, trebuie evaluat argumentul functiei, pentru a cunoaste valoarea a carei patrat trebuie calculat Argumentul e un produs in care unul din factori e el insusi o expresie apel de functie Ca atare, din intreaga expresie sa evalueaza intai sqr (2), apoi se inmulteste 3 cu rezultatul (4), iar cu valoarea 12 se efectueaza al doilea apel la sqr, cu rezultatul 144 Desi apelul exterior la sqr contine o subexpresie cu un apel la aceeasi functie, expresia nu are caracter recursiv, intrucat valoarea functiei sqr e calculata direct din valoarea parametrului transmis Valoarea lui sqr (2) e necesara ca argument pentru apel, nu in corpul functiei ca in definitiile recursive Apelul de functie Evaluarea unui apel de functie e declansata atunci cand valoarea functiei e necesara in evaluarea unei expresii (inclusiv in executia unei instructiuni de forma expresie ; ) Ea se efectueaza in urmatorii pasi: • se evalueaza toate expresiile care constituie argumentele functiei Deci orice apeluri de functii care apar in argumente se efectueaza inainte de apelul functiei considerate • valorile argumentelor se atribuie parametrilor formali din antetul functiei, cu conversiile necesare de tip (de exemplu intreg-real) Compatibilitatea de tip a expresiilor argument e verificata deja la compilare, daca se cunosc tipurile parametrilor formali - un motiv in plus pentru care se cere ca o functie sa fie declarata inainte de a fi folosita • se executa corpul functiei, cu parametrii formali avand valori initiale ca mai sus La intalnirea instructiunii return, executia functiei se incheie cu valoarea obtinuta prin evaluarea expresiei date • executia programului revine la locul de apel, unde valoarea returnata de functie e folosita Transmiterea parametrilor in limbajul C transmiterea parametrilor la functii se face : in momentul apelului, parametrii formali iau valoarea argumentelor (care au fost evaluate); ei nu sunt substituiti cu expresiile argumentelor in consecinta, pentru apelul discutat mai sus, in instructiunea return se va evalua expresia 12 * 12 si nu (3 * sqr (2)) * (3 * sqr (2)) Altfel spus, o functie lucreaza cu valori (numerice), nu cu expresii simbolice Expresia 3 * sqr(2) se evalueaza o singura data, inainte de apelul sintactic exterior la sqr, si deci apelul la sqr(2) e deja incheiat in momentul celui de-al doilea apel, sqr (12) Acest fapt poate fi vizualizat augmentand functia sqr cu o instructiune de tiparire, o practica utila in urmarirea executiei programelor #include int sqr(int x) { printf("calculam patratul lui %d n", x); return x * x; } int main(void) { printf("sqr(3 * sqr(2)) = %d n", sqr(3*sqr(2))); return 0; } calculam patratul lui 2 calculam patratul lui 12 sqr(3 * sqr(2)) = 144 Rezultatul rularii programului (prezentat sub textul sursa) evidentiaza si pentru apelul printf din main evaluarea argumentelor inainte de inceperea executiei functiei Evaluarea argumentului al doilea produce cele doua linii tiparite in apelurile la sqr Aceste linii apar inainte de a scrie chiar si portiunea de text obisnuit din primul argument (formatul), scriere in care consta tocmai executia functiei printf Returnarea valorilor La intalnirea instructiunii return, executia functiei se incheie; orice alte instructiuni care urmeaza in corpul functiei nu se mai executa Daca executia unei functii se termina prin atingerea ultimei acolade } fara a executa o instructiune return, iar programul utilizeaza valoarea functiei, efectul e (programul se comporta imprevizibil) O functie care nu prevede in orice situatie o valoare returnata e scrisa eronat sub aspect logic Recursivitate Versiune preliminara 2 12 martie 2009 Programarea calculatoarelor Note de curs Marius Minea Apelul recursiv Discutam mecanismul apelului recursiv luand ca exemplu calculul lui 53 cu functia putere in apelul pwr(5, 3) (x = 5,n = 3), expresia conditionala conduce la evaluarea lui 5 * pwr(5, 2) Aceasta necesita un nou apel la pwr, de data aceasta cu parametrii x = 5, n = 2 Procesul se repeta cu pwr(5, 1) pana la apelul pwr(5, 0) pentru care valoarea se calculeaza direct: 1 Din acest apel se revine in locul unde a fost facut: la evaluarea lui 5 * pwr(5, 0) care poate fi efectuata acum Apelul pwr(5, 1) returneaza valoarea 5 folosita in calculul 5 * pwr(5, 1) pentru valoarea lui pwr(5, 2) in final, aceasta valoare, 25, e folosita in expresia 5 * pwr(5, 2) pentru a calcula valoarea 125 a lui pwr(5, 3) pwr(5, 3) apel# "125 5* pwr(5, 2) apel# "25 5* pwr(5, 1) apel# "5 5* pwr(5, 0) apel# "1 1 Urmarind secventa de apel, rezulta ca la un moment dat pot fi in executie mai multe apeluri diferite la aceeati functie Fiecare apel reprezinta o instanta (copie) distincta a functiei, cu propriile valori de parametri, cele primite in momentul apelului La fel se intampla ti in calculul pe hartie, prin "desfaturarea" formulei de recurenta: pentru a calcula pe 53, inlocuim pe x cu 5 ti n cu 3 Trebuie calculat 52: aplicand din nou formula inlocuim pe n cu 2; aceasta nu afecteaza insa instanta initiala a problemei (53), unde n este in continuare 3 in exemplul dat, recursivitatea are o structura liniara: fiecare apel recursiv genereaza un singur nou apel, panaa la oprirea pentru cazul de bazaa in acel moment, sunt active toate apelurile, in numaar de 4 Revenirea se face in ordine inversa fata de cea de apel: din fiecare apel se revine in instanta care a efectuat apelul Aici, executtia se reia in contextul dinainte de apel: adicaa din locul in care a fost facut apelul (unde e folosita valoarea returnata), ti cu acele valori ale parametrilor corespunzand instanttei respective Ca la orice apel de functie, informatia se transmite spre functia apelata prin parametri, ti inapoi spre locul de apel (functia apelanta) prin rezultat in exemplul dat se creeaza un lant in care la revenire, valoarea returnataa de fiecare apel e folositaa in instantta apelantaa pentru calculul propriului rezultat, care e transmis mai departe inapoi spre locul de apel Rezultatul final e astfel efectul unui calcul in care cate un pas e efectuat de fiecare din instanttele apelate Aceasta e esenta recursivitaattii tsi in acelatsi timp puterea ei in rezolvarea de probleme: ea permite exprimarea indirecta a solutiei prin pati simpli, din aproape in aproape, faaraa a necesita formularea directaa a unei soluttii complexe 2 4 Elementele unei definitii recursive intr-o definitie recursiva corecta se pot identifica urmatoarele componente: • Cazul de baza Trateaza situatiile (cele mai simple) in care notiunea recursiva e definita direct Exemple: pentru tirurile recurente, primul termen (sau mai multi, la recurentele de ordin > 1) pentru liste, cea vidaa, sau cea cu un element; pentru expresii, constantele tsi identificatorii • Relatia de recurenta: partea propriu-zis recursiva a definifiei, in care notiunea definita apare si in corpul definittiei Exemple: formulele de recurenttaa pentru tsiruri; ramura de definittie "o listaa e un element urmat de o listaa"; pentru expresii, variantele de definittie cu expresie iintre paranteze, apel de functii (cu parametri expresii) ti cele cu expresii compuse cu operatori • Terminarea recursivitatii Daca definifia urmeaza o ramura care confine din nou notiunea definita, atunci definitia trebuie aplicata din nou O notiune e corect definita recursiv daca acest proces se opreste iintotdeauna Pentru a fi riguroasaa, o definitie recursivaa trebuie iinsotitaa de o demonstratie caa aplicarea definitiei se opreste dupaa un numaar finit de pasi Rezulta ca o definitie recursiva nu poate fi corecta fara un caz de baza, pentru ca nu se ajunge niciodataa la un punct unde notiunea poate fi definitaa direct Cazul de bazaa si relatia recursivaa sunt de fapt alternative ale aceleiasi definitii Acest lucru e explicit iin regulile sintactice care definesc Recursivitate Versiune preliminam 3 12 martie 2009 Programarea calculatoarelor Note de curs Marius Minea constructii de limbaj cum ar fi expresiile Pentru un sir recurent putem evidentia aceasta folosind , , (a n = 0 , acolade: xn = 0 Pentru terminarea recursivitatii, cel mai uzual argument foloseste o masura (cantitate) care descreste la fiecare aplicare a definitiei, pana atinge o valoare pentru care definitia e data direct La siruri recurente, aceasta cantitate e chiar indicele n al termenului general xn 2 5 Alte exemple de recursivitate insiruirea vazuta recursiv si notiuni din afara matematicii, uneori foarte simple, se preteaza la definitii recursive Un tipar de definitii des intalnit se bazeaza pe faptul ca iteratia (repetitia) poate fi definita prin recursivitate Putem defini astfel: Un sir (secventa, lista) e fie un element, fie un sir urmat de un element Uneori e util sa includem in definitie sirul vid, care apare natural in diverse operatii (ca element neutru la concatenare; la initializarea sau dupa stergerea tuturor elementelor unei liste, etc ): Un sir e fie un sir vid, fie un element urmat de un sir Cele doua variante difera atat prin cazul de baza (un element sau zero), cat si prin pozitia notiunii recursive in definitie: prima varianta e (notiunea definita recursiv "sir" e pruna in rescrierea "sir urmat de un element"), iar a doua e , deoarece in expandarea "element urmat de un sir" notiunea definita "sir" e pe ultima pozitie Tiparele de recursivitate la stanga si la dreapta conduc la prelucrari diferite in program, pe care le studiem si comparam in continuare Recursivitatea in sintaxa limbajelor Recursivitatea apare natural in definirea precisa a sintaxei limbajelor de programare Multe elemente de limbaj au in componenta repetitia, care poate fi exprimata recursiv Astfel, antetul unei functii poate fi definit (in limita celor prezentate pana acum) ca: antet-functie ::= tip identificator ( parametri ) parametri ::= void | lista-param lista-param ::= tip identificator | tip identificator , lista-param Am folosit conventional simbolurile ::= pentru definitie si | pentru alternativa Acest mod de a descrie regulile sintactice, adica unui limbaj se numeste forma Backus-Naur (BNF) Putem defini recursiv si alta notiune fundamentala, expresia Din cele prezentate pana acum: expresie ::= constanta | identificator | identificator ( argumente ) | - expresie | expresie operator-binar expresie | expresie ? expresie : expresie | ( expresie ) argumente ::=e | lista-argumente lista-argumente ::= expresie | expresie , lista-argumente unde e denota conventional alternativa cu continut vid (fara simboluri de limbaj), aici pentru apeluri de forma functie(), fara argumente intre paranteze Pentru o definitie riguroasa, trebuie sa precizam ca orice constructie de limbaj trebuie definita printr-un numar finit de aplicari ale regulilor Astfel, -(2 + 3) e o expresie: se pot aplica pe rand regulile expresie ::= - expresie, expresie ::= ( expresie ), expresie ::= expresie + expresie, si de doua ori regula expresie ::= constanta 2 6 Doua tipare de calcul recursiv Sa examinam un alt exemplu tipic de calcul recursiv, factorialul Avem: n  = 0) unsigned fact(unsigned n) { return n == 0 ? 1 : fact(n-l) * n; } Recursivitate Versiune preliminara 4 12 martie 2009 Programarea calculatoarelor Note de curs Marius Minea in evaluarea lui fact pentru n > 0, ultima operatie efectuata e inmultirea cu n, restul calculelor fiind efectuate anterior in apelul fact(n-l) (si celelalte apeluri recursive care rezulta din acesta) Ordinea calculelelor e indicata de paranteze in expresia n! = ((((1   1)   2)   )   (n — 1))   n Expandand definitia pentru n > 2, obtinem n! = ((n — 2)!   (n — 1))   n inainte de evaluarea lui fact(n-2) stim ca in produs apar atat n cat ti n-1, dar asa cum e scrisa functia, nu se efectueaza direct inmultirea lor: se apeleaza intai fact(n-2), rezultatul e inmultit cu n-1 in cadrul apelului fact(n-l), ti doar in final se face inmultirea cu n, pentru rezultatul lui fact(n) Pornind de la aceasta observatie, rescriem factorialul folosind asociativitatea inmultirii, pentru a efectua cat mai multe inmultiri indata ce factorii devin disponibili: n! = 1   (2   (   ((n — 1)   n))) Transcriind in C, am dori sa efectuam inmultirea (n-1)*n in cadrul apelului functiei pentru n-1, inainte de a calcula recursiv factorialul pentru n-2 in apelul pentru n-2 s-ar putea inmulti apoi rezultatul lui (n-1)*n cu n-2, etc Pentru aceasta, avem nevoie sa transmitem la fiecare apel recursiv pe langa valoarea lui n ti rezultatul inmultirilor deja efectuate Obtinem astfel: unsigned fact r(unsigned n, unsigned r) return n == 0 ? r : fact r(n-1, r * n); Parametrul r reprezinta rezultatul partial calculat La fiecare apel recursiv, el e inmultit cu valoarea curenta a lui n ti rezultatul e transmis mai departe la apelul pentru n-1 Cand n==0, toate inmultirile au fost deja efectuate; rezultatul se gasette acumulat in r ti poate fi returnat Pe ramura recursiva, la revenire nu se mai efectueazaa nici un calcul: valoarea provenitaa din apelul recursiv e deja rezultatul complet tsi e returnataa direct mai departe la apelant Din primul apel pentru n, rezultatul partial pe care dorim sa-l transmitem spre apelul pentru n-1 e tot valoarea n Deci, initial, r ar trebui sa fie 1, ti pentru a calcula pe n! vom apela fact r(n, 1) De fapt, am definit o functie mai generala: fact r(n, r) calculeaza pe r   n! Pentru a nu complica utilizatorul cu parametrul suplimentar datorat modului de calcul, definim functia fact2 cu un singur parametru Aceasta doar "impacheteaza" apelul initial fact r(n, 1) : unsigned fact2(unsigned n) { return fact r(n, 1); } Pentru factorial, putem alege iintre cele douaa variante de scriere iin prima, calculul se face la revenirea din apelurile recursive, iar acestea nu au nevoie de vreun rezultat partial calculat anterior in a doua, rezultatul partial e transmis in adincime ca parametru, ti actualizat inainte de fiecare apel recursiv Existaa insaa situattii in care parte din prelucrare se efectueazaa inainte de fiecare apel recursiv, fiind necesar sa transmitem "in jos" la inaintarea in recursivitate valori ce vor fi folosite ulterior in calcule inversarea cifrelor unui numar Scriem o functie care ia un intreg fara semn ti-l transforma in numarul cu aceleati cifre zecimale dar in ordine inversa Scriem solutia pornind de la un exemplu: 1472 Ultima cifra, 2, devine prima cifra a rezultatului Punem ultima cifra ramasa din 147 dupa 2, obtinand 27 = 2   10 + 7 Din numarul ramas, 14, plasam ultima cifra dupa 27, obtinand 27   10 + 4 = 274, etc in cuvinte, pasul recursiv de prelucrare poate fi exprimat: rezultatul inversaarii, dacaa a mai raamas de inversat n, iar din inversarea ultimelor cifre s-a obtinut deja v, e acelasi cu rezultatul inversarii lui n 10, cu valoarea intermediara 10   v + n mod 10 Deti enuntul problemei are un singur parametru, soluttia recursivaa obttinutaa manipuleazaa douaa cantitaatti, deci scriem o functie recursivaa cu doi parametri Prelucrarea se oprette cand n e 0 (nu mai sunt cifre de inversat), iar initial, v e valoarea fara nici o cifra, deci tot 0; astfel, pentru prima cifra c, expresia 10   0 + c da valoarea dorita c #include unsigned revnum r(unsigned n, unsigned r) return n == 0 ? r : revnum r(n   10, 10 * r + n % 10); unsigned revnum(unsigned n) { return revnum r(n, 0); } Recursivitate Versiune preliminam 5 12 martie 2009 Programarea calculatoarelor Note de curs Marius Minea int main(void) printf("%u n", revnum(1472));    %u pt tiparire unsigned return 0; Dorim ca solutie o functie cu un singur parametru, ca in enunt, pentru a nu complica utilizatorul cu un parametru suplimentar pentru valoarea intermediara Am scris astfel functia revnum care apeleaza functia recursiva revnum r(n, 0) cu valoarea initiala necesara pentru al doilea parametru Cel mai mare divizor comun Algoritmul lui Euclid pentru calculul celui mai mare divizor comun a doi intregi pozitivi e un exemplu clasic de algoritm exprimat recursiv, in care o problema e rezolvata prin reducerea la o instanta mai simpla a aceleiati probleme Exprimat informal, algoritmul e: - daca numerele sunt egale, rezultatul e chiar valoarea lor comuna - altfel, se scade cel mai mic numar din cel mai mare, ti se repeta procedura cu noile numere Exprimarea din urma ("se repeta procedura") indica abordarea recursiva: solutia se obtine re-zolvand aceeati problema pentru valori noi ale numerelor (mai mici, deci dupa un numar finit de pati se ajunge la cazul de baza) Putem scrie deci pe cazuri: {a a = b cmmdc (a — b,b) a>b cmmdc(a, b — a) altfel (a b ? cmmdc(a - b, b) : cmmdc(a, b - a); Detsi am transpus direct din cuvinte iin formula recursivaa tsi apoi iin cod, a raamas netratat un aspect: enuntul initial e dat pentru numere pozitive, iar tipul unsigned permite ti valoarea 0 Apelarea (chiar si accidentalaa) a functiei scrise mai sus cu un parametru nul va duce la o secventaa infinitaa de apeluri recursive, deoarece scazand 0 celalalt numar nu se modifica si reluam acelasi apel (pana cand, in functie de mediul de rulare, programul se va termina probabil forttat epuiziand resursele de memorie) Este important ca functiile pe care le scriem sa fie robuste ti sa nu produca erori neprevazute ti catastrofale Ca atare, rescriem functia tinand cont ca 0 e divizibil cu orice numar, si deci cmmdc(a, 0) = cmmdc(0, a) = a: unsigned cmmdc(unsigned a, unsigned b) return b == 0 ? a : a == 0 ? b : a > b ? cmmdc(a - b, b) : cmmdc(a, b - a); in aceasta scriere, cazul a = b = 0 va intra pe ultima ramura, avand ca efect apelul cmmdc(a, 0) care va returna a 2 7 Calculul recursiv al seriilor Calculul sumei partiale a unei serii se preteaza natural la o exprimare recursiva Notand cu in termenul general, si sn = ZX=o tk, obtinem imediat pentru termenul general: sn = sn 1 + in (pentru n > 1), si deci: Recursivitate Versiune preliminam 6 12 martie 2009 Programarea calculatoarelor Note de curs Marius Minea sn — t0 n — 0 sn-1 + in altfel (n > 0) Avand o formula de calcul direct pentru termenul in al seriei (exprimat deci ca functie de n), putem transforma direct formula de mai sus intr-o functie recursiva s, care apeleaza pentru calcul functia t Pentru exemplul simplu in — 1 n (pentru n > 1, iar to — 0), putem scrie urmatorul program: #include #include double t(unsigned n) return n == 0 ? 0 : 1 0 n; double s(unsigned n) return n == 0 ? t(0) : s(n-1) + t(n); int main(void) printf("Constanta lui Euler e aprox %f n", s(1000)-log(1000)); return 0; Din matematica, stim ca seria s2n 1 n are suma infinita, ti creste aproximativ ca logaritmul natural al lui n Mai precis, lim (X 1 k - lnn) — 7'0 5772 k=1 Aproximatia calculata de program are primele 3 zecimale exacte Tipul double folosit in program reprezinta, ca si float, numere reale, dar cu precizie mai buna (de aici si numele), si e recomandabil in calcule pentru a micsora acumularea erorilor de rotunjire Este tipul standard folosit de functiile matematice de biblioteca declarate in math h (cum e si functia log pentru logaritmul natural) si tipul implicit pentru constantele reale Scrierea 1 0 n pentru termenul matematic 1 n e necesara pentru a obtine impartire reala: operandul 1 0 fiind real, va fi convertit implicit si intregul n Altfel, 1 n ar fi insemnat impartire intreaga, cu rest, si valoarea 0 pentru n > 1 Desigur ca in acest program s-ar fi putut scrie mai simplu, direct double s(unsigned n) return n == 0 ? 0 : s(n-1) + 1 0 n; Varianta prezentata evidensiaza insa forma generala a funcsiei s pentru suma exprimata recursiv, ssi programul poate fi adaptat la altaa serie prin simpla inlocuire a funcstiei t E valabilaa aceeassi observastie generalaa faacutaa intai pe exemplul factorialului despre cele douaa variante de definire a calculului recursiv Funcsia s asa cum a fost scrisa mai sus efectueaza calculul la revenirea din recursivitate, ultima adunare fiind sn 1 + in, corespunzand unei grupari a calculelor de forma: sn — (((t0 + t1) + t2) + ) + in Cealalta alternativa o constituie transmiterea ca parametru suplimentar a unui rezultat parsial deja calculat (inisial zero), la care se acumuleaza in fiecare pas termenul curent Astfel, termenii sunt adunasi efectiv in ordine inversa: sn — t0 + (t1 + ( + (in 1 + in))) Funcsia se scrie: double s2(unsigned n, double res) return n == 0 ? res + t(0) : s2(n-1, res + t(n)); Recursivitate Versiune preliminara 7 12 martie 2009 Programarea calculatoarelor Note de curs Marius Minea Pentru a pastra aceeasi forma naturala cu un singur parametru a functiei folosite direct de programator, redefinim functia s, apeiand pe s2 cu valoarea 0 ca suma partiala deja calculata: double s(unsigned n) { return s2(n, 0); } Acest cod poate inlocui functia s din programul anterior, pastrand functia t 2 8 Calculul de aproximari cu o precizie data Majoritatea exemplelor de calcul numeric recursiv date pana acum au aceeati structura: se calculeaza termenul unui tsir definit printr-o relattie de recurenttaa, iar numaarul de apeluri recursive necesar pentru calcul e determinat de ordinul (indicele) termenului, dat la primul apel in matematica ti practica intalnim insa adesea cazuri se dorette un calcul efectuat cu o anumita precizie: se genereazaa o secventtaa de aproximaari, iar caand aproximarea curentaa a atins precizia doritaa, calculul se opretste Dam ca exemplu o metoda de aproximatie pentru calculul radacinii patrate y x O cunoscuta formula matematica (poate fi derivata din metoda lui Newton, dar in acest caz particular e mult mai veche) ne da secventa de aproximari: ara+1 = (an + x an) 2, cu o aproximare initiala arbitrara (de exemplu a0 = 1) Elementul cheie al solutiei e formularea problemei pentru a-i identifica si scoate in evidenta caracterul recursiv: mai precis, parametrii si cazul de baza (oprirea din secventa de apeluri recursive) Conditia de oprire nu e data de aproximarea initiala, ci, dupa cum rezulta din enunt, de precizia atinsa Pentru a o calcula, avem nevoie de aproximarea curenta, care devine astfel parametru al problemei (cu valoarea initiala data de a0) Putem atunci enunta solutia iin cuvinte iin felul urmaator: functia caautataa returneazaa o aproximatie de precizie data a lui д x, stiind o aproximare curenta an data de asemenea ca parametru Daca aproximatia curentaa e suficient de bunaa, poate fi returnataa (cazul de bazaa) Dacaa nu, rezultatul va fi dat de aceeasi functie de calcul, apelataa tot pentru x, dar cu o aproximatie curentaa mai bunaa, dataa de an+1 #include #include double rad(double x, double a) return fabs(a - x a) -   0x30: 0 1 2 3 4 5 6 7 8 9 ? 0x40: © A В C D E F G H i J К L M N 0 0x50: P Q R S T U V W X Y Z [   1 ** 0x60: a b c d e f g h i j к 1 m n O 0x70: p q r s t u V w X У z { 1 } in tabelul de mai sus, am dispus caracterele pe 8 linii si 16 coloane, pentru a obtine usor codul ASCii al fiecarui caracter ca valoare in baza 16 Conventional, cu valori de la 10 la 15 se reprezinta prin literele de la A la F (fie mari, fie mici) in limbajul C, numerele hexazecimale se disting prin prefixul Ox sau 0X Astfel, din tabel vedem ca litera A are codul ASCii 0x41, adica 65 Litera z are codul 0x70 + OxA, adica 7-16 + 10 = 122 Observam ca in tabela ASCii, cifrele, litere mari si literele mici sunt reprezentate in trei grupuri contigue, dar separate Caracterele fiind reprezentate in calculator ca intregi, cand scriem programe care prelucreaza caractere, ar trebui sa folosim codurile lor Conform tabelului de mai sus, pentru a verifica daca un caracter dintr-un sir este C ar trebui sa-l comparam cu numarul 0x43 (67) Scrierea numerica ar fi atat un inconvenient, cat si o posibila sursa de erori si dependente de implementare Caractere si citirea lor Versiune preliminara 1 10 martie 2008 Programarea calculatoarelor Note de curs Marius Minea Strict vorbind, standardul C99 nu prevede folosirea codificarii ASCii, ci doar cerinta de a putea reprezenta cifrele, literele mari si mici, majoritatea celorlalte semne grafice uzuale, spatiul si sfarsitul de linie, ca reprezentarea fiecarui caracter sa incapa pe un octet, iar cifrele sa aiba valori succesive in reprezentare in limbajul C, ne putem referi la unui caracter (codul sau numeric) incluzand caracterul intre apostroafe, de exemplu: ’c’, ’7’, ’: Aceste valori se numesc , al treilea tip de constante intalnit pana acum, pe langa cele intregi si reale intre apostroafe putem incadra orice caracter tiparibil (inclusiv spatiul), dar apostroful insusi trebuie prefixat cu caracterul   (backslash): ’ pentru a-1 deosebi de cele doua apostroafe delimitator Caracterele de control (netiparibile) uzuale sunt reprezentate prin secvente speciale formate din   urmat de o litera care sugereaza numele caracterului:  a’ alert ’ n’ newline ’ r’ carriage return  b’ backspace ’ v’ vertical tab apostrof  t ’ tab ’ f ’ form feed ’W’ backslash Caracterul   trebuie dublat pentru a fi interpretat ca atare si nu ca inceput de secventa speciala, o conventie uzuala, folosita si pentru % in printf Ca aplicatie, scriem o functie simpla care returneaza valoarea (codul) cifrei hexazecimale care reprezinta un numar intre 0 si 15 Functia va returna deci in cod de caracter fie intre ’0’ si ’9’, fie intre ’A’ si ’F’ Cifrele zecimale fiind reprezentate consecutiv, codul pentru cifra de valoare n este ’0’ + n (cu n pozitii mai mare decat codul lui 0) Pentru litere, codul va fi cu n - 10 pozitii mai mare decat cel al lui A (de exemplu, ’A’ + (14 - 10) == ’E’)    returneaza cifra hexazecimala pentru 0 7 > } Am tratat cazul de eroare (apelul cu un parametru mai mare decat 15) prin returnarea valorii ’?’, care ar fi evidenta la tiparire O alta varianta ar fi returnarea valorii -1 care nu reprezinta un caracter Decizia e la latitudinea proiectantului functiei, dar si documentate, impreuna cu functionalitatea, in comentarii 3 2 Scrierea de caractere Caracterele fiind reprezentate intern ca intregi, se pune intrebarea e cum se poate produce reprezentarea fizica a unui caracter pornind de la codul sau, adica cum se poate scrie (tipari) Acest lucru e realizat in C de functia standard int putchar(int c);    declarata in stdio h care scrie caracterul cu valoarea (codificarea numerica) data la iesirea standard (in mod implicit, ecranul) Functia putchar returneaza chiar valoarea parametrului (caracterul scris), conventie uzuala la unele functii standard care, desi au scopul principal de a produce un efect vizibil (sau efect lateral, notiune discutata ulterior), returneaza si un rezultat utilizabil la nevoie in program Apelul putchar(’O’) va tipari deci cifra 0 (fara apostroafe, folosite doar la reprezentarea constantelor caracter in program) Nu vom scrie putchar(48), deoarece e mai greu de inteles si depinde de tablelul de codificare folosit Apelul putchar(xdigit (12)) (cu functia xdigit de mai sus) va scrie litera (cifra hexazecimala) C Reamintim ca o constanta caracter trebuie incadrata intre apostroafe-, astfel putchar(9) nu scrie cifra 9, ci caracterul cu valoare 9 (in codul ASCii: tab); ’a’ +5 este ’f’, dar in expresia a + 5 se presupune ca a este un identificator (declarat anterior), a carui valoare se aduna cu 5, etc 3 3 Citirea de caractere intr-un program C, citirea unui caracter se poate face cu functia standard Caractere si citirea lor Versiune preliminara 2 10 martie 2008 Programarea calculatoarelor Note de curs Marius Minea int getchar(void);    declarata in stdio h Functia nu are nevoie de parametri (citirea se face de la intrarea standard, implicit tastatura) Ea returneaza valoarea caracterului citit (codul acestuia) Orice functie care interactioneaza cu mediul extern (in acest caz, intrarea furnizata de utilizator) este supusa la potentiale situatii de eroare pe care nu le poate controla, si trebuie sa poata raporta aceste erori La citire, e posibil ca intrarea sa nu poata furniza un caracter: de exemplu, daca utilizatorul a incheiat introducerea datelor, sau daca programul este rulat cu intrarea standard redirectata, citind datele dintr-un fisier in locul tastaturii, si s-a ajuns cu citirea la sfarsitul fisierului in caz de eroare, functia getchar returneaza o valoare speciala identificata prin numele EOF ( ) Valoarea EOF e specificata de standard ca un intreg negativ (pentru a fi diferita de valorile caracterelor, care sunt nenegative), si e definita in fisierul antet stdio h care contine si declaratia functiei getchar, si a celorlalte functii standard de intrare iesire (de ex printf) in multe implementari, valoarea EOF este -1, dar nu trebuie sa ne bazam pe aceasta presupunere Ne referim la aceasta valoare speciala doar prin nume 3 4 Functii pentru clasificarea caracterelor in multe probleme dorim sa prelucram doar caractere de un anumit fel De exemplu, daca citim de la intrare un cuvant, caracter cu caracter, trebuie sa ne oprim cand caracterul citit nu mai este o litera Limbajul C ofera o serie de functii care testeaza din ce categorie face parte un caracter Toate aceste functii sunt declarate in fisierul antet ctype h Ele iau ca parametru un intreg (codul caracterului de testat), si returneaza un intreg care e: nenul daca caracterul se incadreaza in categoria dorita (corespunzatoare cu numele functiei), sau zero in caz contrar De exemplu, functia int isdigit(int c);    declarata in ctype h returneaza o valoare nenula daca parametru este un caracter de la ’ 0 ’ la ’ 9 ’ si zero pentru orice alt caracter Dam ca exemplu o functie care returneaza valoarea (de la 0 la 9) a unui caracter cifra zecimala, sau —1 (ca si cod de eroare) pentru orice alt caracter: int digitvalue(int c) { return isdigit(c) ? c - ’0’ : -1; } Expresia c - ’ 0 ’ numara cate cifre sunt intre c si 0 in tabela de caractere, ceea ce e tocmai valoarea cifrei c (de exemplu, ’7’ - ’0’ == 7) Este tocmai calculul invers celui facut in functia xdigit Acest exemplu necesita o clarificare privind valorile logice in limbajul C Ca si conditie pentru operatorul ? : am folosit pana acum doar expresii (cum ar fi comparatiile) care pot da rezultatele logice adevarat sau fals in limbajul C, nu exista tip special pentru valori logice Operatorii care furnizeaza valori logice (cum ar fi cei relationali si de comparatie, = 0) { printf ("Solutia l%f n", (-b-sqrt(delta)) 2 a); printf("Solutia 2%f n", (-b+sqrt(delta)) 2 a); } else printf("Nu are solutie reala n"); } in acest exemplu, ramura corespunzatoare valorii "adevarat" a conditiei contine o instructiune compusa formata din doua instructiuni, iar ramura "else" contine o singura instructiune Scriem ca alt exemplu o functie care returneaza valoarea (de la 0 la 15) corespunzatoare cifrei hexazecimale date ca argument, sau —1 pentru orice alt caracter Daca parametrul e intr-adevar cifra hexazecimala, urmeaza o a doua decizie, dupa cum caracterul e cifra sau nu (deci litera); ultimul caz e tratat uniform folosind convertirea la litere mici Altfel, functia returneaza —1 #include int xdigitvalue(int c) { if (isxdigit(c))    e cifra hexa ? if (isdigit(c)) return c - ’0’;    e cifra else return tolower(c) - ’a’ + 10;    litera hexa else return -1;    orice altceva } Acest exemplu arata ca deciziile pot fi incuibate, rezultand structuri logice complexe in acest caz, prima clauza (instructiune  ) a primei instructiuni if e la randul ei o instructiune conditionala, rezultand in total trei ramuri de executie prin functie, pe fiecare aflandu-se o instructiune return Aceasta incheie executia functiei, chiar daca in text ar mai fi urmat alta instructiune Acelasi efect se obtine scriind corpul functiei cu expresia conditionala: return isxdigit(c) ? isdigit(c) ? c-’O’ : tolower(c)-’a’ + 10 : -1; Comparand expresia si instructiunea conditionala prima observatie e ca, evident, pe cele doua ramuri se afla intr-un caz expresii de acelasi tip, iar in celalalt caz, instructiuni in al doilea rand, instructiunea conditionala Versiune preliminara 2 23 martie 2008 Programarea calculatoarelor Note de curs Marius Minea operatorul ? : are obligatoriu cate o expresie pentru cele doua ramuri, pe cand instructiunea if exista si an forma scurta fara else, care specifica o instructiune de executat doar cand conditia e adevarata Aceasta diferenta devine importanta cand instructiunea ancheie o functie, ca mai sus O functie care returneaza o valoare (are alt tip decat void) trebuie sa-ti incheie executia in toate cazurile cu un rezultat bine definit Orice cale de executie prin functie trebuie sa ajunga la o instructiune return expresie ; ti este o eroare daca, dimpotriva, se ajunge la acolada de sfartit }, comportamentul in momentul folosirii rezultatului necunoscut al apelului fiind nedefinit De exemplu, absenta ultimei clauze else return -1; ar fi fost incorecta, lasand rezultatul nedefinit in cazul unui parametru care nu e cifra hexazecimalaa Compilatorul poate detecta astfel de situatii ti genera un mesaj de avertisment Deti mult mai putin folosita, expresia conditionala e mai sigura aici: sintaxa ei obliga programatorul sa specifice valori pe ambele ramuri Pe de alta parte, combinarea mai multor operatori ? : devine greu de citit, tsi de aceea e preferabilaa folosirea instructtiunii if Punerea ain paginaa a functtiei ilustreazaa nitste conventtii de structurare pentru citirea mai utsoaraa a programelor Codul de pe ramurile deciziei se scrie cu un rand mai jos, indentat (de regula cu doua spatii) spre interior fata de cuvantul cheie if sau else Daca ramura e o singura instructiune, foarte scurtaa ca text, se poate plasa mai compact pe acelatsi raand cu if, respectiv else Folosirea deciziilor complexe pune problema asocierii corecte antre ramurile acestora De exemplu, urmaatoarea structuraa ar putea fi cititaa in douaa feluri: if (exprl) if (expr2) instrl else instr2 O variantaa e saa consideraam primul if ca formaa scurtaa, avand ca ramuraa de condittie adevaarataa o instructiune if completa A doua interpretare e cu primul if an forma completa, ramura sa adevarata fiind in formaa scurtaa, faaraa else Standardul prevede prima interpretare: un else e intotdeauna asociat cu cel mai apropiat if permis de sintaxa Daca dorim a doua varianta, trebuie sa separam al doilea if de ramura else care nu-i aparttine, incadrandu-l intr-un bloc, ca in partea dreaptaa a figurii Asocierea implicitaa e echivalentaa cu cea din stanga figurii Detsi acoladele nu sunt necesare aici, adaaugarea lor clarificaa structura tsi evitaa confuziile la modificarea ulterioaraa a programului if (exprl) { if (expr2) instrl else instr2 a) asocierea implicitaa if (exprl) { if (expr2 ) instrl } else instr2 b) asociere modificata cu { } Un alt mod de a retine ti explica aceasta regula e ca se alege asocierea care ar putea fi extinsa, ti astfel echilibrataa, cu aincaa o clauzaa else dacaa aceasta ar urma imediat ain program Se observaa caa acest lucru e posibil cu asocierea din stanga figurii (a celui mai apropiat if), dar nu si cu cea din dreapta indentarea, adesea sugerata de editoarele pentru scrierea de programe, ne ajuta la vizualizarea structurii codului, dar nu modifica regulile sintaxei (spatiile suplimentare sunt ignorate de compilator) in speta, ea nu poate schimba asocierea else - if an exemplul de mai sus, si alinierea instructiunilor la acelatsi nivel una sub alta nu poate substitui gruparea lor intr-un bloc if (expr ) instrl else instr2 instr3 if (expr ) instrl e de fapt else instr2 instr3 Ca element de programare defensivaa se recomandaa folosirea instructtiunilor compuse pe ambele ramuri ale unei decizii, chiar cand ele conttin doar cate o instructtiune, pentru a evidenttia mai bine structura codului tsi a evita erori la adaaugiri ulterioare De exemplu, o instructtiune scrisaa dupaa unica instructtiune dintr-o clauzaa else nu mai aparttine sintactic de aceasta, ci e o instructtiune independentaa tsi se executaa dupaa instructtiunea if, indiferent de sensul deciziei instructiunea conditionala Versiune preliminara 3 23 martie 2008 Programarea calculatoarelor Note de curs Marius Minea Revenim la problema tiparirii caracter cu caracter a unui numar in baza 10 instructiunea if ne permite traducerea directa a definitiei recursive: apelul recursiv pentru n 10 trebuie facut doar pentru numere de mai multe cifre Efectul dorit al functiei este tiparirea, ea nu efectueaza vreun calcul si nu trebuie sa returneze un rezultat in consecinta, ea poate fi declarata cu tipul void (tipul vid, fara valori), semnificand ca nu returneaza o valoare #include void prininat(unsigned n) { if (n > 9) prininat(n   10);    ramura recursiva putchar(’O’ + n 7 10);    cazul de baza: ultima cifra } Executia functiei se incheie cu ultima instructiune din corpul ei, fara a fi necesara instructiunea return Ea poate fi folosita insa (in forma return; fara expresie, functia neavand rezultat) acolo unde se doreste revenirea din functie fara a mai continua pana la sfarsitul corpului ei 4 3 Operatori logici in expresii conditionale si instructiunea if am folosit pana acum expresii simple, mai precis, cu cel mult un operator de comparare: void revline(int c) if (c != EOF && c != ’ n’) { revline(getchar()); putchar(c); int main(void) revline(getchar()); putchar(’ n’); return 0; instructiunea conditionala Versiune preliminara 5 23 martie 2008 Programarea calculatoarelor Note de curs Marius Minea 5 Variabile si atribuire 5 1 Evaluarea si refolosirea expresiilor Functiile clin C, asemenea celor clin matematica, au ca principal scop efectuarea unor prelucrari, datele fiind transmise obisnuit ca parametri Uneori insa functiile folosesc in calcule rezultate intermediare O functie de calcul al radacinilor unei ecuatii de gradul ii de exemplu, ar calcula intai discriminantul, selectand apoi una din trei variante depinzand de semnul acestuia Pentru aceasta trebuie efectuate doua decizii, iar codul ar avea structura: if ( discriminant > 0) { tipareste solutie 1 tipareste solutie 2 } else if (discriminant == 0) tipareste solutia unica else printf("nu are solutie reala n"); in schema de mai sus, fie ca inlocuim discriminant cu apelul unei functii care ii calculeaza valoarea, fie direct cu expresia respectiva, calculul sau va fi repetat la fiecare aparitie (inclusiv in calculul celor doua solutii distincte) Dincolo de ineficienta, (repetarea in program a aceluiasi fragment, aici expresia discrim(a, b, c)saub*b - 4*a*c) este ea insasi daunatoare, ducand la cod mai complicat si mai greu de intretinut, cu potentiale erori daca ulterior modificarile nu se opereaza consistent pe toate fragmentele duplicate O solutie ilustrata anterior e scrierea unei functii care sa ia ca parametri a, b, si discriminantul deja calculat delta Aceasta ar urma sa fie apelata din functia care ia ca parametri a, b, si c, asa cum s-a cerut void printsol(double a, double b, double delta);    scrisa in capitolul anterior, similar cu schema de mai sus void solve eq2(double a, double b, double c) { printsol(a, b, c, b*b - 4*a*c); } Astfel, expresia pentru discriminant apare si e calculata doar o data, la apel, valoarea fiind referita apoi in functie prin numele parametrului Totusi e nenaturala scrierea unei functii auxiliare pentru aceasta problema simpla Am intalnit situatia si la functia readint, care nu are parametri, insa transmite un prim caracter citit cu getcharO functiei auxiliare readint c care continua citirea dupa cum acesta e semn sau nu Aici nu putem repeta apelul getcharO in locul fiecarei utilizari a parametrului c, deoarece apeluri repetate ar consuma si returna caractere diferite de la intrare Dam inca un exemplu, pentru a insista asupra modului in care se face transmiterea parametrilor in limbajul C, prin valoare Scriem o functie care calculeaza puterea (naturala) a unui numar real prin injumatatirea succesiva a exponentului, dupa relatia de recurenta: ( 1 n = 0 xn =    pentru declaratia functiei sqrt void solve eq2(double a, double b, double c) {    cere a != 0 double delta = b*b - 4*a*c; if (delta > 0) { printf ("solutia 1: 7of  n" , (-b - sqrt (delta) ) 2 a); printf ("solutia 2: 7of  n" , (-b + sqrt (delta) ) 2 a); } else if (delta == 0) printf ("solutie unica: 7of n", -b 2 a); else printf("nu are solutie n"); } Variabile si atribuire Versiune preliminara 2 23 martie 2008 Programarea calculatoarelor Note de curs Marius Minea Putem scrie declaratii in cadrul oricarei instructiuni compuse, in particular in corpul unei functii Variabilele declarate astfel se numesc (discutam ulterior declaratiile de variabile globale, plasate in afara functiilor) Urmatoarele notiuni sunt importante pentru utilizarea variabilelor locale: - este portiunea din program unde se poate folosi identificatorul declarat (e recunoscut ca numele unei variabile de tipul precizat) Se extinde din momentul declararii si pana la sfarsitul blocului respectiv - reprezinta portiunea din executia programului pentru care se pastreaza in memorie valoarea variabilei Se extinde de asemenea din momentul parcurgerii declaratiei si pana la parasirea blocului respectiv - se face la fiecare parcurgere a declaratiei Expresia din initializare e deci reevaluata la fiecare parcurgere a declaratiei in executia programului, si poate produce deci rezultate diferite: astfel, fiecare apel la functia readint produce o noua evaluare a apelului getchar () care va returna alt caracter de la intrare Variabila c nu poate fi folosita in afara functiei si valoarea ei se pierde intre doua apeluri succesive Standardul ANSi C (1989) si corespondentul sau iSO din 1990 permiteau declaratii doar inaintea tuturor instructiunilor dintr-un bloc Standardul C99 defineste un bloc ca secventa arbitrara de declaratii si instructiuni Astfel putem declara si initializa o variabila chiar in punctul din program in care e nevoie de ea pentru a memora valoarea unei expresii in exemplele de pana acum, valoarea variabilelor nu e modificata; ele sunt folosite doar pentru referirea la valoarea expresiei din momentul initializarii Aceasta corespunde cu limbajele de programare functionale, unde se foloseste termenul de legatura (binding) intre numele variabilei si valoarea expresiei Fundamental, aceasta notiune e diferita de mutabilitate, proprietatea de a putea modifica valoarea de care e legata o variabila in C, care este un limbaj imperativ, nu se face o astfel de distinctie, si valoarea unei variabile poate fi modificata prin operatia de (cu acelasi simbol, = ca si initializarea in declaratie, desi sunt doua notiuni conceptual si sintactic diferite) Altfel exprimat, in C nu privim variabila conceptual ca un nume la care se asociaza prin legatura o valoare, ci operational ca nume asociat cu o locatie de memorie a carei continut poate fi modificat de oricate ori Exceptie fac variabilele declarate cu calificatorul de tip const inaintea numelui de tip, de exemplu: const double e = 2 71828; Nu e permisa modificarea valorii unui obiect astfel declarat, iar compilatorul va semnala ca eroare eventuale atribuiri la acesta in program 5 3 Atribuirea Operatia de atribuire modifica o valoare memorata Ea are sintaxa: ivalue = expresie Termenul , valoare care se poate afla la stanga (left-hand) unei atribuiri inseamna o expresie care desemneaza un , adica in limbajul C, o zona de memorie care contine o valoare Aceasta categorie include variabilele si elemente de limbaj prezentate ulterior (elemente de tablou, referiri prin adrese) Atribuirea este o operatie, iar = un operator binar, cu doi operanzi Efectul operatiei este evaluarea ambilor operanzi (evaluarea celui din stanga determina care e obiectul atribuit), si atribuirea valorii expresiei din dreapta la obiectul din stanga Valoarea intregii expresii de atribuire este chiar valoarea atribuita in C, atribuirea este o expresie si nu o instructiune, si poate apare oriunde sintaxa cere o expresie, inclusiv ca parte a altei expresii Sigur, cel mai frecvent ea e folosita independent, urmata de ; in instructiunea-expresie, de exemplu: x = a + 5; Ea poate fi folosita insa si in partea dreapta a altei atribuiri: xl = x2 = -b (2*a) Operatorul de atribuire e asociativ la dreapta, deci x2 e atribuit cu b (2*a), iar xl e atribuit cu valoarea expresiei x2 = -b (2*a), care conform definitiei e valoarea membrului drept Deci xl si x2 sunt atribuiti cu aceeasi valoare, dupa cum sugereaza si sintaxa Pe de alta parte, atribuirea nu reprezinta o expresie ivalue: expresia de atribuire are noua valoare a obiectului atribuit, dar nu il reprezinta pe acesta Ca atare, nu putem folosi o atribuire in partea stanga a altei atribuiri Variabile si atribuire Versiune preliminara 3 23 martie 2008 Programarea calculatoarelor Note de curs Marius Minea Atentie! O eroare frecventa e folosirea atribuirii = in loc de comparatie == sau invers; codul rezultat poate fi corect sintactic, dar se comporta altfel decat intentionat Folosind comparatia in loc de atribuire apar instructiuni-expresie de tipul x == у + z; care nu au nici un efect (rezultatul comparatiei nu e folosit); multe compilatoare vor semnala un avertisment Mai frecventa e greseala de a scrie = in loc de == O decizie de forma if (x = 5) e corecta sintactic, atribuirea fiind o expresie! Valoarea ei e cea a expresiei atribuite, aici nenula, deci "conditia" va fi intotdeauna adevarata, iar x e atribuit neintentionat! Similar, x = 0 rezulta intr-o conditie falsa si aici, compilatorul poate avertiza la folosirea atribuirii ca principal operator intr-o decizie; intentia programatorului e insa mai greu de evaluat Astfel, sintagma if (var = func ( )) e valida si chiar folosita in programe Ea apare cand rezultatul functiei trebuie memorat pentru folosire ulterioara, dar in acelasi timp testat pentru validitate (un rezultat nul fiind considerat invalid, conventie standard pentru tipul adresa) Expresia de atribuire din test se considera adevarata doar daca e nenula, deci codul de pe prima ramura a deciziei e executat doar pentru un rezultat valid Totusi, compararea explicita if ((var = func( )) != 0) chiar daca semantic redundanta, e mai clara si exprima mai bine intentia codului 5 4 Valoare si efect lateral Citirea si scrierea (de la intrarea la iesirea standard, sau mai general din in fisiere), si atribuirea unui obiect memorat sunt principalele cazuri in care un program modifica starea mediului sau de executie, actiune numita Denumirea e consistenta cu abordarea functionala pe care am introdus- o la inceput: principalul scop al unei functii e calculul rezultatului, iar eventualele modificari sunt secundare (laterale) fluxului principal de computatie in stilul imperativ de programare practicat in limbajul C, efectul lateral are insa rolul central: principala instructiune elementara e instructiunea-expresie, reprezentata de atribuire si apelurile functiilor de intrare iesire Expresia care formeaza o astfel de instructiune nu e evaluata pentru valoarea sa, ci pentru efectul lateral Dat fiind ca orice expresie (mai putin apelul unei functii void) are o valoare, iar unele au si efect lateral, care poate modifica inclusiv valorile unor variabile folosite in evaluarea expresiei, se pune intrebarea cand are loc efectul lateral in raport cu evaluarea expresiei Limbajul C nu impune restrictii privind ordinea de evaluare a operanzilor in expresii, cu exceptia celor implicite in definirea operatorilor conditional si de secventiere, si particularitatii operatorilor && si i i La fel, momentul exact in care au loc efectele laterale e lasat la latitudinea implementarii, si e constrans doar prin definirea unor puncte de secventa (sequence points), unde toate efectele laterale produse de evaluarile anterioare trebuie sa fi produs Asemenea puncte de secventa se afla dupa fiecare instructiune, inainte de un apel de functie (dupa evaluarea parametrilor), inainte de revenirea dintr-o functie de biblioteca, dupa evaluarea primului operand la && si | |, etc intelegerea detaliilor privind ordinea de evaluare si efectele laterale e importanta pentru a evita programe cu efect dependent de implementare sau nedefinit 5 5 Operatorii de pre- si postincrementare Linul din cele mai frecvente cazuri particulare de atribuire e incrementarea sau decrementarea unei variabile folosite ca si contor, de exemplu i = i + 1 sau i = i - 1 Limbajul C ofera operatorii unari ++ si — pentru scrierea mai concisa a acestor expresii Ei sunt tot operatori de atribuire si merita o atentie deosebita datorita folosirii lor frecvente si existentei a doua variante diferite, si : ++lvalue respectiv lvalue++ Discutam incrementarea, toate observatiile fiind valabile pentru decrementare, cu schimbarea lui +1 in -1 Aplicat ca operator prefix sau postfix, ++ are acelasi efect lateral: incrementarea obiectului referit de expresie (in particular: variabilei) cu valoarea +1 corespunzatoare tipului respectiv (aceasta precizare, aparent evidenta pentru tipuri numerice e foarte importanta pentru adrese, discutate ulterior) Valoarea expresiei de incrementare este insa diferita in cele doua cazuri Pentru operatorul postfix, valoarea intregii expresii e cea a operandului inainte de incrementare, in timp ce pentru operatorul prefix, valoarea expresiei e cea a operandului dupa incrementare Precizand printr-un exemplu, secventa int n = 0; printf ("7,d ", n++); printf("%d ", n); va tipari 0 1, dar urmatoarea secventa pereche va tipari 1 1 : Variabile si atribuire Versiune preliminara 4 23 martie 2008 Programarea calculatoarelor Note de curs Marius Minea int n = 0; printf("%d ", ++n); printf("%d ", n); in ambele cazuri, al doilea numar tiparit este 1, pentru ca efectul celor doua operatii asupra lui n este acelasi: incrementarea Difera primul numar tiparit, care este valoarea expresiei de incrementare, adica valoarea lui n inainte si respectiv dupa incrementare, conform definitiei date Deosebirea se retine usor deoarece pozitia operatorului fata de expresie sugereaza relatia in timp intre atribuire si evaluarea expresiei Putem considera ca atat ++n cat ti n++ au valoarea lui n, dar in primul caz atribuirea se efectueaza inainte de evaluarea expresiei, iar in al doilea caz dupa Corespunzator, se folosesc denumirile preincrementare ti postincrementare, ti la fel pentru decrementare Folosite ca instructiuni-expresie, ambele au acelasi efect (deoarece nu se foloseste valoarea expresiei) Dimpotriva, folosite in alte expresii, ele difera datorita valorii lor, ca in exemplul tiparirii incheiem ilustrand cu un exemplu aspectele legate de ordinea de evaluare ti efecte laterale O exprimare gresita a intentiei de incrementare este i = i++ Aceasta expresie are un efect nedefinit asupra variabilei i, deoarece contine doua efecte laterale: cel de incrementare ti cel de atribuire, iar ordinea lor de aplicare nu e restrictionata de limbaj Presupunand ca valoarea initiala a lui i este 2, valoarea lui i++ este tot 2 Ca urmare a expresiei, pe de o parte i urmeaza a fi incrementat, pe de altaa parte atribuit cu 2 iin functie de ordinea iin care se aplicaa cele douaa modificaari, valoarea finalaa a lui i poate fi 2 sau 3 La fel, pentru i = ++i, efectele laterale sunt incrementarea si atribuirea cu 3 (valoarea lui ++i daca i era 2), iar valoarea finala a lui i poate fi 3 sau 4 (daca e incrementat dupa atribuire) Ambele expresii prezentate au efect nedefinit iin C si compilatorul le semnaleazaa printr-un mesaj de avertisment Nedefinit e si efectul tiparirii int i=2; printf("%d %d", i++, ++i); deoarece limbajul C nu precizeaza ordinea de evaluare a argumentelor unei functii, si nici ordinea intre evaluarea expresiilor si aplicarea efectelor laterale Evaluarea celor doua expresii ti aplicarea efectelor in diverse ordini produc: 2 4 eval-1 efect-1 (i = 3) eval-2 efect-2 (i = 4) 3 3 eval-2 efect-2 (i = 3) eval-1 efect-1 (i = 4) 2 3 eval-1 eval-2 efect-1 (i = 3) efect-2 (i = 4) iin concluzie, trebuie evitate expresii iin care o variabilaa afectataa printr-un efect lateral apare tsi iin altaa subexpresie, iar ordinea de evaluare a celor douaa expresii e neprecizataa, deoarece iin acest caz efectul global e nedefinit ti deci eronat Mai general, efectele laterale in expresii complexe trebuie tratate cu multaa atenttie, preferiand descompunerea iin expresii mai simple 5 6 Operatori de atribuire compusi iin practicaa apar des actualizaari ale unei variabile printr-o operattie elementaraa (adunare, iinmulttire, etc ) fata de valoarea curenta, de exemplu: x = x - 10 sau n = n   q Limbajul C ofera o forma prescurtata pentru expresii de forma lvalue = lvalue op expresie ti anume lvalue op = expresie Exemplele de mai sus se rescriu astfel: x -= 10 si n  = q Exista operatori compusi pentru cei 5 operatori aritmetici: +=, -=, *=,  =, %= ti altii pentru prelucrari pe biti care vor fi prezentati la capitolul corespunzaator iin toate cazurile, operatorul compus este un singur element lexical, faaraa spatiu iintre operatorul de atribuire = si cel care-l precede 5 7 Un calculator pentru expresii simple Scriem un program care citeste de la intrare o expresie formata din numere intregi, operatori aritmetici si paranteze, si calculeazaa valoarea ei Definim iintiai recursiv forma expresiilor acceptate: expresie ::= termen | expresie + termen | expresie - termen termen ::=factor | termen * factor | termen   factor factor ::= numar-natural | + factor | - factor | ( expresie ) iin aceastaa definitie, expresie si termen sunt similar structurate, diferaa doar operatorii multiplicativi, respectiv aditivi Pentru expresie, de exemplu, calculul recursiv va avea permanent o valoare curentaa, initializata prin apelul la termen Pe masura ce se citesc constructii de tipul ±termen, ele intra in calculul noii valori curente, care e transmisaa ca parametru iin noul apel recursiv Variabile si atribuire Versiune preliminam 5 23 martie 2008 Programarea calculatoarelor Note de curs Marius Minea Deci, o varianta de baza, care nu accepta decat operatorii + si -, fara paranteze, ar avea structura: int expr(int e) {    apelam cu expr(readnat()) int c = getchar(); return c == ’+’ ? expr(e + readnat()) : c == ’-’ ? expr(e - readnat()) : e; Pornind de la acest schelet de baza, scriem programul complet tinand cont de urmatoarele observatii: - expresie nu e formata direct din numere naturale, ci din termeni, deci apelul readnat va fi inlocuit cu un apel catre functia (tot recursiva) care citeste si calculeaza valoarea unui termen - termen e construit similar cu expresie, din factori - un factor poate fi o expresie (complexa) antre paranteze an acest caz, se apeleaza recursiv functia pentru expresie, iar la revenire se verifica existenta parantezei inchise - acceptam oricate spatii in expresie, care trebuie consumate inainte de fiecare operator sau factor - tiparim un mesaj daca acolo unde e necesar un operand nu apare un numar natural #include #include int skipsp(int c) { return isspace(c) ? skipsp(getchar()) : c; } int skipspace(void) { return skipsp(getchar()); } unsigned readnat rc(unsigned r, int c) { return isdigit(c) ? readnat rc(10*r + (c-’0’), getchar()) : (ungetc(c, stdin), r); int readnat c(int c) { if (isdigit(c)) return readnat rc(0, c); else { printf("lipseste numar n"); return 0; } int expr(); int factor() { int c = skipspace(); if (c == ’+’) return factor(); else if (c == ’-’) return - factor(); else if (c == ’(’) { int res = expr(); if ((c = skipspace()) != ’)’) printf("lipseste ) n"); return res; } else return readnat c(c); int term2(int f) { int c = skipspace(); if (c == ’*’) return term2(f * factor()); else if (c == ’ ’) return term2(f   factor()); else { ungetc(c, stdin); return f; } int term(void) { return term2(factor()); } int expr2(int t) { int c = skipspace(); if (c == ’+’) return expr2(t + term()); else if (c == ’-’) return expr2(t - term()); else { ungetc(c, stdin); return t; } int expr(void) { return expr2(term()); } int main(void) { printf("%d n", expr()); return 0; Variabile si atribuire Versiune preliminam 6 23 martie 2008 Programarea calculatoarelor Note de curs Marius Minea 6 instructiuni iterative 6 1 iteratia implementata recursiv Am vazut, ca prin putem defini structuri : siruri, liste, secvente, si putem im- plementa prelucrari asupra lor: calcul de siruri recurente, sume de serii, aproximari cu o precizie data, etc E important sa putem exprima repetitia (iteratia) si in mod direct, pentru eficienta, dar si pentru scriere mai simpla Ca exemplu simplu, consideram numararea (cu tiparire) de la o valoare initiala la o valoare finala data ilustrand zicala ca un drum oricat de lung incepe cu un singur pas, scriem: , " f daca s (d = nextdiv(n, d)) { printf(" u*", d); n = n   d; printf(" u n", n);    divizorul ramas Apelam functia cu numarul dorit, de exemplu printfact(108) care va tipari 2*2*2*3*3 Conditia de intrare in ciclu actualizeaza intai prin atribuire valoarea urmatorului divizor, si apoi il compara cu n Astfel, la intrarea in ciclu stim sigur ca in afara de divizorul gasit vom mai avea inca unul, si putem anticipa tiparind semnul de inmultire Ultimul divizor (care va fi chiar valoarea ramasa a lui n) e tipaarit dupaa iessirea din ciclu Prezentam si doua solutii care nu au efect lateral (atribuirea la d) in conditia din ciclu in prima varianta, e nevoie sa scriem atribuirea la d de doua ori in corpul functiei: prima data ca initializare, iar a doua oaraa la finalul ciclului, pentru ca noua valoare saa fie testataa la urmaatoarea iterastie A doua varianta tipareste toti factorii urmati de un semn de inmultire * pana cand n devine 1 Apoi, semnul de inmultire e sters scriind caracterul  b (backspace) care deplaseaza inapoi cursorul, dupa care e suprascris cu un spatiu, si cursorul e deplasat din nou inapoi Solutia e utila pentru tipaarirea pe ecran, dar nerecomandabilaa pentru programe care ar putea fi folosite prin redirectarea iesirii (discutata ulterior) pentru scrierea intr-un fisier void printfact1(unsigned n) unsigned d = nextdiv(n, 2); while (n > d) { printf(" u*", d); n = n   d; d = nextdiv(n, d); printf(" u n", n); void printfact2(unsigned n) unsigned d = 2; while (n > 1) { d = nextdiv(n, d); n = n   d; printf(" u*", d); printf(" b  b n"); 6 3 Transformarea recursivitatii in iteratie Rezolvarile recursive si iterative ale unei probleme au caracteristici diferite: solutia recursiva se scrie de regulaa faaraa a folosi atribuirea, datele curente la fiecare apel fiind reprezentate de parametrii functtiei Pentru aceasta, functtia poate avea nevoie de parametri suplimentari pentru a transmite spre instantta apelata recursiv rezultate partiale deja calculate in cazul scrierii iterative, in acest scop se folosesc variabile declarate in functtie in scrierea recursiva e necesar un test (instructiune if sau expresie conditionala) pentru a decide daca se continua secventa de apeluri recursive sau s-a ajuns la cazul de baza in scrierea iterativa, aceasta este condittia de continuare a ciclului in locul unui apel recursiv, corpul ciclului actualizeazaa variabilele cu expresiile corespunzaatoare noilor valori ale parametrilor Aceste aspecte sunt ilustrate mai jos comparand scrierea recursivaa tsi iterativaa a functtiilor factorial tsi de citire a unui numaar natural unsigned fact r(unsigned n,unsigned r) unsigned fact(unsigned n)    apel initial: fact r(n, 1); if (n > 0) return fact r(n - 1, r * n); else return r; unsigned r = 1; while (n > 0) { r = r * n; n=n- 1; return r; instructiuni iterative Versiune preliminara 3 19 octombrie 2011 Programarea calculatoarelor Note de curs Marius Minea unsigned readnat r(unsigned r)    apel initial: readnat r(0); int c; if (isdigit(c=getchar())) return readnat r(r*10+(c-’0’); ungetc(c, stdin); return r; unsigned readnat(void) unsigned r = 0; int c; while (isdigit(c=getchar())) r = r*10 + (c-’0’); ungetc(c, stdin); return r; in ultimul exemplu, apelul isdigit(c = getchar()) combina trei actiuni: se citeste un caracter din intrare, e atribuit variabilei c pentru folosire ulterioara si transmis ca argument pentru a testa daca e cifra Cum instructiunea return de pe ramura de test adevarat in varianta recursiva incheie executia functiei, ultimele doua instructiuni se executa doar atunci cand testul e fals, fara a necesita scrierea explicitaa a unei ramuri else sirul lui Fibonacci Unul din cele mai cunoscute siruri recurente e tirul lui Fibonacci, definit ca: F0 = 0, Fi = 1, Fn = Fn 1 + Fn 2 (pentru n > 2) Transcriind direct definitia recursiva obtinem: unsigned fib r(unsigned n) return n #define EPS 1e-6 double radacina(double x)    trebuie x >= 0 instructiuni iterative Versiune preliminara 5 19 octombrie 2011 Programarea calculatoarelor Note de curs Marius Minea double ult, crt =1 0;    crt trebuie initializat do { ult = crt; crt = (crt + x crt) 2; } while (fabs(crt - ult) > EPS); return crt; Filtrarea de texte Dam un alt exemplu legat de prelucrarea de texte Dorim sa extragem informatia utila dintr-un text scris in HTML (sau XML), unde ea este intercalata cu etichete de forma Practic, trebuie sa executam repetat doua actiuni: tiparirea textului util pana la caracterul , si reluarea procedurii Pentru aceasta scriem intai doua functii generice, care ignora, respectiv tiparesc, toate caracterele de intrare pana la un caracter de oprire dat ca parametru in ambele cazuri, trebuie sa luam precautia ti sa oprim prelucrarea la atingerea sfartitului de fisier, pentru a evita blocarea la infinit Ambele functii returneaza caracterul specificat la care s-au oprit, sau EOF #include int skipchars(int stop) { int c = getchar(); while (c != EOF && c != stop) c = getchar(); return c; int writechars(int stop) { int c = getchar(); while (c != EOF && c != stop) { putchar(c); c = getchar(); return c; Programul alterneaza cele doua functii, incepand cu tiparirea, care se va opri imediat in caz ca intalnette '); } while (c != EOF); return 0; instructiuni iterative Versiune preliminam 6 19 octombrie 2011 Programarea calculatoarelor Note de curs Marius Minea 6 instructiuni iterative 6 1 iteratia implementata recursiv Am vazut, ca prin putem defini structuri : siruri, liste, secvente, si putem im- plementa prelucrari asupra lor: calcul de siruri recurente, sume de serii, aproximari cu o precizie data, etc E important sa putem exprima repetitia (iteratia) si in mod direct, pentru eficienta, dar si pentru scriere mai simpla Ca exemplu simplu, consideram numararea (cu tiparire) de la o valoare initiala la o valoare finala data ilustrand zicala ca un drum oricat de lung incepe cu un singur pas, scriem: , " f daca s (d = nextdiv(n, d)) { printf("%u*", d); n = n   d; printf("%u n", n);    divizorul ramas Apelam functia cu numarul dorit, de exemplu printfact(108) care va tipari 2*2*2*3*3 Conditia de intrare in ciclu actualizeaza intai prin atribuire valoarea urmatorului divizor, si apoi il compara cu n Astfel, la intrarea in ciclu stim sigur ca in afara de divizorul gasit vom mai avea inca unul, si putem anticipa tiparind semnul de inmultire Ultimul divizor (care va fi chiar valoarea ramasa a lui n) e tipaarit dupaa iessirea din ciclu Prezentam si doua solutii care nu au efect lateral (atribuirea la d) in conditia din ciclu in prima varianta, e nevoie sa scriem atribuirea la d de doua ori in corpul functiei: prima data ca initializare, iar a doua oaraa la finalul ciclului, pentru ca noua valoare saa fie testataa la urmaatoarea iterastie A doua varianta tipareste toti factorii urmati de un semn de inmultire * pana cand n devine 1 Apoi, semnul de inmultire e sters scriind caracterul  b (backspace) care deplaseaza inapoi cursorul, dupa care e suprascris cu un spatiu, si cursorul e deplasat din nou inapoi Solutia e utila pentru tipaarirea pe ecran, dar nerecomandabilaa pentru programe care ar putea fi folosite prin redirectarea iesirii (discutata ulterior) pentru scrierea intr-un fisier void printfact1(unsigned n) unsigned d = nextdiv(n, 2); while (n > d) { printf("%u*", d); n = n   d; d = nextdiv(n, d); printf("%u n", n); void printfact2(unsigned n) unsigned d = 2; while (n > 1) { d = nextdiv(n, d); n = n   d; printf("%u*", d); printf(" b  b n"); 6 3 Transformarea recursivitatii in iteratie Rezolvarile recursive si iterative ale unei probleme au caracteristici diferite: solutia recursiva se scrie de regulaa faaraa a folosi atribuirea, datele curente la fiecare apel fiind reprezentate de parametrii functtiei Pentru aceasta, functtia poate avea nevoie de parametri suplimentari pentru a transmite spre instantta apelata recursiv rezultate partiale deja calculate in cazul scrierii iterative, in acest scop se folosesc variabile declarate in functtie in scrierea recursiva e necesar un test (instructiune if sau expresie conditionala) pentru a decide daca se continua secventa de apeluri recursive sau s-a ajuns la cazul de baza in scrierea iterativa, aceasta este condittia de continuare a ciclului in locul unui apel recursiv, corpul ciclului actualizeazaa variabilele cu expresiile corespunzaatoare noilor valori ale parametrilor Aceste aspecte sunt ilustrate mai jos comparand scrierea recursivaa tsi iterativaa a functtiilor factorial tsi de citire a unui numaar natural unsigned fact r(unsigned n,unsigned r) unsigned fact(unsigned n)    apel initial: fact r(n, 1); if (n > 0) return fact r(n - 1, r * n); else return r; unsigned r = 1; while (n > 0) { r=r*n; n=n- 1; return r; instructiuni iterative Versiune preliminam 3 24 martie 2008 Programarea calculatoarelor Note de curs Marius Minea unsigned readnat r(unsigned r)    apel initial: readnat r(0); int c; if (isdigit(c=getchar())) return readnat r(r*10+(c-’0’); ungetc(c, stdin); return r; unsigned readnat(void) unsigned r = 0; int c; while (isdigit(c=getchar())) r = r*10 + (c-’0’); ungetc(c, stdin); return r; in ultimul exemplu, apelul isdigit(c = getchar()) combina trei actiuni: se citeste un caracter din intrare, e atribuit variabilei c pentru folosire ulterioara si transmis ca argument pentru a testa daca e cifra Cum instructiunea return de pe ramura de test adevarat in varianta recursiva incheie executia functiei, ultimele doua instructiuni se executa doar atunci cand testul e fals, fara a necesita scrierea explicitaa a unei ramuri else sirul lui Fibonacci Unul din cele mai cunoscute siruri recurente e tirul lui Fibonacci, definit prin relatiile: F0 = Fi = 1, Fn = Fn 1 + F> 2 (pentru n > 2) Transcriind direct definitia recursiva obttinem: unsigned fib r(unsigned n) return n #define EPS 1e-6 instructiuni iterative Versiune preliminara 5 24 martie 2008 Programarea calculatoarelor Note de curs Marius Minea double radacina(double x)    trebuie x >= 0 double ult, crt =1 0;    crt trebuie initializat do { ult = crt; crt = (crt + x crt) 2; } while (fabs(crt - ult) > EPS); return crt; Filtrarea de texte Dam un alt exemplu legat de prelucrarea de texte Dorim sa extragem informatia utila dintr-un text scris in HTML (sau XML), unde ea este intercalata cu etichete de forma Practic, trebuie sa executam repetat doua actiuni: tiparirea textului util pana la caracterul , si reluarea procedurii Pentru aceasta scriem intai doua functii generice, care ignora, respectiv tiparesc, toate caracterele de intrare pana la un caracter de oprire dat ca parametru in ambele cazuri, trebuie sa luam precautia ti sa oprim prelucrarea la atingerea sfartitului de fisier, pentru a evita blocarea la infinit Ambele functii returneaza caracterul specificat la care s-au oprit, sau EOF #include int skipchars(int stop) { int c = getchar(); while (c != EOF && c != stop) c = getchar(); return c; int writechars(int stop) { int c = getchar(); while (c != EOF && c != stop) { putchar(c); c = getchar(); return c; Programul alterneazaa cele douaa functtii, aincepaand cu tipaarirea, care se va opri imediat ain caz caa aintaalnetste ’); } while (c != EOF); return 0; instructiuni iterative Versiune preliminara 6 24 martie 2008 instalare Ocaml + Emacs pe Windows Scop: am scris acest tutorial pentru ca si eu am avut dificultati in instalarea editorului Emacs, al compilatorului Ocaml si in folosirea acestora pe Windows Poate unii pasi par triviali dar am vrut sa ma asigur ca acopera tot Structura: Note: • Puteti incerca sa copiati folder-ul cygwin pe calculatorul colegilor ca sa nu trebuiasca instalat de mai multe ori S-ar putea sa nu mearga, depinde de mai multi factori Daca o faceti, arhivati-l mai intai • Folosesc varianta pe 32 de biti deoarece instalarea pe 64 de biti are nevoie de biblioteci externe pentru variant grafica a editorului Emacs; e o solutie mai simpla folosirea acestei variante, care este suficienta pentru cursul nostru • Nu spun nimic despre multe optiuni la instalarea Cygwin-ului deoarece variantele default sunt bune pentru ce avem noi nevoie • De ce text? Pentru ca text-ul e mai usor de parcurs in ritmul fiecaruia decat un video • De ce nu am postat o copie a instalarii mele de cygwin? E destul de mare, chiar si arhivata si nu e garantat ca va merge pe toate calculatoarele pe care e copiata, in special pe cele cu literele partitilor schimbata fata de cele obisnuite instalarea 1 intrati la adresa: si descarcati setup-ul: Rur iny time yc-u ' vant te updats cr iiiatall a Cygwir paakage for 32-Ъй " ndo’-v 2 Dupa ce se descarca rulati-l Qrsdnize   Ое"г Stwrh Dwr^,!"4fs " © ННиу i-—| Mrtwp-RWiexB 1 1 4' S к Б — (Gffl — rf-Jt-tfda; > | AJi ecs-ckm^rfc-j Daca mai apar chestii apasati pe Ok sau Yes pana apare setup-ul c Cyykdn hetRsitaSifl' ad’jp Program Si41 г кер- ’ ч'Я л   1  h * 3 Apasati Next pana ajungeti la fereastra asta: c Soite Taur irtcmcl Cerncxtion rtfrdito kw* Ы y"L "sn f с 1: iwU'iwtjrt Ctac" J h "sn рг иік idk-дя tttLik O 43 Ё?ЬтВ7 F"Ofl Salif'VJS   j ля HTTF Pmr’ tki "SiL BC Lee ? Ce'xt Apasati din nou Next si va aparea lista cu oglinzi de pachete 4 Alegeti una dintre cele care se termina cu corn Apasati Next 5 O sa isi descarce setup-ul, o sa dureze putin 6 Apoi o sa apara o lista de pachete de unde puteti alege pachetele necesare 7 Cautati emacs C □ A O S ,- dc O S Csb g ф S Ce - O>=-' s EcLc-s ф S hLt'p'^'K-s O C"3jt 8 Click pe Devei si apoi pe Editors Seiect Packages Seectpsckegeslc ігзі-з □ ? O Cera   Ѳ ij:i: O Ders L El Lat : O Zs-=jL El Ф Ca-iJ" И EdL:-s 4> І е-зл И  insera © jaf azL 9 Apasati pe click in dreptul pachetelor emacs-ocaml, emacs si emacs-w32, ar trebui sa arate asa dupa: □ Zc e O O P rtfa р й е"-=сг-зе"5хі r'-=ciz2 О4 П ixl П т"‘=с-&-:'С=''і iTizZ-s пс:* □ EdiL"-: л   =• =   024 5-2 s □ 2’ 242 -p rtfa р й т'' =2-5 = , ‘ iL z 'tit • ; O>-P rtfa р й т*' i " т А "'   С ГС O-: P д й Г( й э -P rtfa р й 2 -2>-' e-ECS- 1" E-'i-='2stiiare O>-P д й Г^й 5 "22 -P д й р й t,v‘ i 2-i- f -5 ’v‘ ' z O -P g a р й іі' Kf^zL^S- r zrv z2i = O-: P g o гѴй 5си Xt"! z :s4i : i - 5C’ ' 5T   10 Apoi scrieti la search ocaml □ Se e-ct Packa-ges : l И И bsOZy- И COr О ІГі 11 Click pe interpreters □ Se e-ct Packa-ges □ o И Ze 5 Ш -Ci Ѵ-Уі-І И сс=- О 2г= - 12 Selectati ocaml si ocaml-compiler-libs, emacs-ocaml va fi bifat, lasati-l asa □ 0-:: ixl П : > = :i  : c="  = :г " O-:• S □ O -; n ci n a   MiF c :er"n-t-ese e "Ca'' O-' p n а rtfa  -T2 c :ar-i  :ar'i p-4 "-= CC 0-3- s □ 28-K c:=" zcwp rr- bs "  13 Acum apasati Next 14 Apasati din nou Next de 2 ori, va incepe sa downloadeze si sa instaleze pachetele necesare; o sa dureze ceva, e normal C г - n Progress "i i ;:c-5 c ip z i t'z p  c zzi c* fe оск   ozc : Ca sa nu va plictisiti pana se downloadeaza uitati o pagini cu probleme de logica, rezolvarile sunt la sfarsitul documentului; nr 2 e putin sadica Problema 1: Conduci masina intr-o noapte furtunoasa cand treci pe langa o statie de autobuz si vezi trei persoane asteptand autobuzul • femeie in varsta care pare pe punctul de a-si da duhul • Un vechi prieten care ti-a salvat la un moment dat viata • Sufletul pereche la care ai visat toata viata • Tinand cont ca nu poti lua decat un singur pasager la tine in masina, pe cine alegi? Problema 2: Neastamparatul Nelu a fost avertizat de mama sa sa nu deschida niciodata usa de la beci ca sa nu vada ceea ce nu e menit sa vada intr-o zi, cand mama sa era plecata de acasa, Nelu a deschis usa Ce a vazut? Problema 3: Un raufacator nebun te obliga sa joci cu el ruleta ruseasca in cilindrul revolverului sunt trei gloante, in camere consecutive Revolverul are sase camere pentru gloante La inceputul jocului, cilindrul revolverului este invartit o singura data Apoi, arma va fi pasata intre voi (tu si raufacator) pana cand va elibera un glont Raufacatorul te lasa sa alegi daca vrei sa fii primul sau al doilea care trage Ce ai alege, pentru a avea mai multe sanse sa ramai in viata? Explica raspunsul (aceasta problema a facut parte dintr-un interviu la Google) 15 Dupa ce s-a downloadat totul va aparea setup-ul va arata asa: in st aliat ion Status rs! = =tcr Sc-"pe‘ T F-w Selectat! Create icon on Desktop apoi click pe Finish 16 Cauati scurtatura care a aparut pe Desktop si dati dublu click pe ea 17 O sa apara ceva de genul c CZopying skeleton files These files are for the users to personalise their cygwin experience They wi11 never be overwritten nor automatically updated ’   bashrc* -> ’ home Tzeny   bashrc’ "- - bash profi le’ -> ’ home Тzeny   bash profi le’ ’   inputrc’ -> ’ home Тzeny   inputr c’ profile’ -> ’ home Тzeny   profi1e" TzenyOTzenyPC   s Tot ce e scris cu verde la voi va fi diferit, e ok 18 Scrieti touch emacs si apasati Enter TzenyOTzenyPC   s touch emacs TzenyOTzenyPC   S 19 Acum intrati in My Computer   This PC, in partitia C: 5' 7 GB free of 292 GB 20 Acum intrati in folder-ul cygwin 10 2 2015 2:22 AM File folder 21 Acum in folder-ul home 13 2 22’ 5 2 22 AM Fi e fc der 12 2 2215 2 22 AM Fi e fc der 12 2'22'5 2 22 AM Fi e fc der 10 2 2015 2 22 AM File folder 13 2 2215 2 22 AM F' e fc der 12 2'2215 2 22 AM Fi e fc der 12 2 2215 2 22 AM F' e fc der 12 2 22'5 2 2' AM Fi e fc der 12 2 22'5 2 22 AM Fi e fc der 12 2 2215 2 22 AM A'ndc^s Batch F'le ' KB 12 2 2215 2 22 AM CCi '54 KB 12,2 221 5 2 22 AM CCi 53 KB 22 Aici va mai fi un folder cu numele vostru de utilizator(in cazul meu Tzeny, al vostru va fi diferit) intrati in el Name Date modified 10 2- 2015 2:25 AM File folder 23 Va avea mai multe fisiere, click drepata pe bashrc si selectati Open with c '  = ii 2 10-2-20*5 2 23 AM Fi e fc der ] 2?s- -'s:cr 12 2 22'5 2 24 AM 8ASH HiSTORY File ' KB J i5i '7* z 13 2 22'5 2 22 AM 8ASH PROF LE Fi e 2 KB □ bashrc 10 2 2015 2:22 AM BASHRC File 5 KB J = "315 Open with EMACS F'le 2 KB 24 Selectati Notepad din lista care apare La Windows 8 trebuie sa dati click pe More options How do you wamt ro open this type of file i hashrcj? Look for an app in the Stere 25 O sa fie cateva linii de text care seamana cu ce e mai jos if - Г f-: 1 : - П Fi € Eoit Format sie c a -c ехте-t pcscide -ce" la , z'i azUic" s za^e dacizatac all т zcp :"igtdz azc "el '' aa "•etjr-  Shell iptic-s## See nar cas- fc" rece cptic-s tt Dcr t -alt fc" jc? te-r  ; La voi nu va scrie Tzeny dar e ok 4 9 38 Apasati New Folder si faceti un folder numit tuareg, apoi selectati-l □ Extract archr,- es tc subfotders 39 Apasati OK 40 inchideti WinRar-ul 41 Daca aveti Cygwin deschis inchideti-l, Ocaml nu va porni pana nu faceti asta bash: command not found Tzeny^TzenyPC   Folosirea i Deschideti cygwin, scrieti emacs si apsati Enter Se va deschide editorul Emacs C -bash: 1*4 г : спелля rodi rnot Fm"nndl izenyOizenyPC   І ЕІИВ С2 Tzeiy:C 2 Apasati pe File, apoi pe Visit a new file Fi 65 r rype: АІ Fi 65 , *) Cancfe 4 Acum apasati impreuna tastele esc si x ; in parte de jos a editorului ar trebui sa apara scris cu albastru 17:;—- JBi^ pL ALL Li (Tuaregi 5 Scrieti tuareg-mode si apasati Enter U кезяпі 6 Textul va disparea Apasati din nou impreuna tastele esc si x, va aparea acelasi Scrieti run-ocaml si apasati enter U -  esz n 7 Va aparea urmatorul text Apasati enter U - Test nl 8 Acum editorul emacs va arata asa: o :r D^gx   i 5 i x " ni g) □ U *ocaml - toplevel * All 13 ;7-areq-ir t e ract ive гиг  Partea de sus este fisierul, partea de jos interpretorul ocaml Exemplu 1 Dati click in partea de sus si scrieti functia exemplu: let f x = x + 1;; 2 Apasati impreuna tastele Ctrl si c apoi Ctrl si b in parte de jos a editorului ar trebui sa apara urmatorul text №iF: * * - - t’: nl 3 Ca sa incercati functia scrieti: f 2;; 4 interpretatorul va raspunde 3 Felicitari, aveti instalat Emacs si Ocaml sub Windows! Salvarea fisierelor 1 Dati click pe File, apoi pe Save As 2 Ca sa salvati pe Desktop, dati click pe Desktop, la nume scrieti de exemplu fisier ml apoi apasati Open Solutii la problemele de logica: 1 Femeia in varsta, bineinteles! Dupa ce o ajuti sa intre in masina, ii dai cheile prietenului tau si astepti autobuzul impreuna cu jumatatea visurilor tale 2 Cand Neastamparatul Nelu a deschis usa de la beci a vazut sufrageria si gradina pe geam El nu mai vazuse niciodata asa ceva pentru ca mama sa l-a tinut inchis in beci toata viata 3 Raspunsul la problema este ca ar trebui sa alegi sa tragi al doilea Numeroteaza camerele cu numere de la 1 la 6 Camerele 1-3 au gloante, camerele 4-6 sunt goale Dupa ce rotim cilindrul revolverului, avem sase posibilitati: 1 Camera 1 - Jucatorul 1 moare 2 Camera 2 - Jucatorul 1 moare 3 Camera 3 - Jucatorul 1 moare 4 Camera 4 - Jucatorul 2 moare (prima impuscatura, jucatorul 1, camera 4 libera; a doua impuscatura, jucatorul 2, camera 5 libera; a treia impuscatura, jucatorul 1, camera 6 libera; a patra impuscatura, jucatorul 2, camera 1 plina) 5 Camera 5 - Jucatorul 1 moare (acelasi rationament ca mai sus) 6 Camera 6 - Jucatorul 2 moare (acelati rationament) Asadar, jucatorul 2 are patru din sase sanse sa castige Autor: Tenescu Andrei (Tzeny) © ReLooper: Refactoring for Loop Parallelism in Java Danny Dig University of illinois dig@cs uiuc edu Mihai Tarce Politehnica University of Timisoara mihai tarce@cs upt ro Cosmin Radoi Politehnica University of Timisoara cosmin radoi@cs upt ro Marius Minea Politehnica University of Timisoara marius@cs upt ro Ralph Johnson University of illinois johnson@cs uiuc edu Abstract in the multicore era, sequential programs need to be refactored for parallelism The next version of Java provides ParallelArray, an array datastructure that supports parallel operations over the array elements For example, one can apply aprocedure to each element, or reduce all elements to a new element in parallel Refactoring an array to a ParallelArray requires (i) analyzing whether the loop iterations are safe for parallel execution, and (ii) replacing loops with the equivalent parallel operations When done manually, these tasks are non-trivial and time-consuming This demo presents ReLooper, an Eclipse-based refactoring tool, that performs these tasks automatically Preliminary experience with refactoring real programs shows that ReLooper is useful Categories and Subject Descriptors D 1 3 [Software]: Concur-rent Programming—Parallel Programming; D 2 3 [Software En-gineering]: Coding Tools and Techniques—Program Editors General Terms Algorithms, Design Keywords Refactoring, program analysis, program transforma-tion, parallelism and concurrency 1 introduction in the multicore era, unless programmers refactor the existing se-quential programs for parallelism, they will not benefit from the underlying parallel processors Refactoring for parallelism is non-trivial, because the refactored code needs to satisfy two conflicting goals: it needs to be thread-safe (i e , run correctly when executed under multiple threads) and scalable (i e , performance continues to improve when adding more cores) The key to scaling performance is to use fine-grained paral-lelism Java will include the ParallelArray framework , a special kind of array that provides fine-grained parallel operations For example, one can apply a procedure to the elements of an array, map elements to new elements, or reduce all elements into a single value like a sum The framework efficiently executes these parallel Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and or a fee OOPSLA '09 October 25-29, 2009, Orlando, Florida, USA Copyright © 2009 ACM s10 00 operations by splitting the computations on array elements among a pool of worker threads, and relying on a runtime library to balance the work among the processors in the system To refactor an existing array into a ParallelArray, the pro-grammer constructs it by using factory methods (e g , by copy-ing elements from other arrays) Then the programmer identifies the loops that iterate over all the array elements and she analyzes each loop to infer its intent (e g , the loop reduces all elements to a value) Next, she replaces the loop body with a call to the equivalent parallel operation (e g , reduce) The parallel operation takes an element operator as an argument and executes it on each element Since Java does not support anonymous functions (i e , lambda ex-pressions), the programmer needs to encapsulate the operator in-side an anonymous class, by subclassing one of the 132 operator classes, and override the op method in addition, since ParallelArray assumes that all parallel computations do not interfere with each other, it runs them without any synchronization it is the programmer's responsibility to verify that indeed the loop iterations do not have conflicting memory accesses This analysis and code rewriting is non-trivial, and time-consuming We have implemented a refactoring tool, ReLooper, that auto-mates the safety analysis and the rewriting of code ReLooper is integrated with Eclipse's refactoring engine, so it offers all the con-venient features of a refactoring engine: previewing the changes, preserving the formatting, undoing changes, etc To use ReLooper, the programmer selects an array and chooses convertToParallelAr-ray from the refactoring menu 2 The Refactoring Tool Figure 1 shows a preview of the changes that ReLooper applies to a small program that works with an array of Complex num-bers A complex number has the form a + bi where a is the real part, and b is the imaginary part The first loop in method ComplexTest test() initializes the array elements using the fac-tory method createRandom() The second loop iterates over all the array elements and computes the square of each complex num-ber The third loop adds all the numbers and stores the result in the sum variable Transformations ReLooper changes the type declaration of numbers into a ParallelArray of Complex objects Then it re-places the code that allocates storage for the array with code that creates aParallelArray with the same capacity, and specifies the base element type and the pool of worker threads that will be used at runtime (defaultExecutor() arranges to use most of the pro-cessors available) i print string s i Neg e -> print char prt form e i And (el, e2) -> print char prt form el; print char prt form e2; print char ’)’ 2 Prelucrare cu rezultat, independenta intre subformule Valoarea functiei depinde doar de formula, fara alti parametri Rezultatul se obtine combinand valorile pentru subformule, fara ca ele sa se influenteze reciproc De exemplu, numarul de operatori clin formula: i V s -> 0 i Neg e -> 1 + count ops e i And (el, e2) -> 1 + count ops el + count ops e2 Similar, adancimea unei formule, adica numarul maxim de operatori "sub" care se afla o propozitie (care se aplica acesteia) O propozitie are adancime 0 (nu are operatori) Fiecare operator adauga 1 la adancime; pentru un operator binar, luam in considerare adancimea maxima dintre cei doi operanzi i V s -> 0 i Neg e -> 1 + maxd e i And (el, e2) -> 1 + max (maxd el) (maxd e2) 3 Prelucrare cu parametru suplimentar, independenta intre subformule Variind problema de mai sus, tiparim adancimea fiecarei propozitii Transmitem un parametru care numara adancimea curenta (numarul de operarori parcursi deja), si e incrementat la fiecare apel: d = i V s -> print int d; print char ’ ’; print endline s i Neg e -> print d (d+1) e i And (el, e2) -> = d+1 print d dl el; print d dl e2 prl 0 1 4 Prelucrare cu parametru si rezultat, independenta intre subformule in acest caz, transmitem informatie atat in jos, printr-un parametru, cat si in sus, ca rezultat De exemplu, determinam claca o anumita propozitie (data printr-un sir de caractere) apare cu polaritate negativa (sub un numar impar de negatii) intr-o formula Pentru aceasta, transmitem de sus in jos un boolean cu aceasta informatie (subformula curenta e sub un numar impar de negatii? - initial, fals): v = neg = i V s -> neg && (s = v) i Neg e -> isnv ( neg) e i And (el, e2) -> isnv neg el || isnv neg e2 isnv false 5 Simplificari: prelucrare la revenirea din recursivitate Cand parcurgem o structura recursiva e important sa deosebim claca facem prelucrarea la parcurgerea in jos sau la revenire Cand simplificam o formula, e important sa o facem (si) la revenirea clin apel (dupa prelucrarea subformulelor), pentru a folosi eventualele rezultate deja simplificate Sa consideram formule care au si variabile prepozitionale si constante boolene: bform = В bool | V string | Neg bform | And bform * bform Scriem o functie care simplifica dupa regulile A A F = F, A f T = A, oricare ar fi formula A i And ( , В false) | And (B false, ) -> В false i And (e, В true) | And (B true, e) -> simpand e i And (el, e2) -> And (simpand el, simpand e2) i Neg e -> Neg (simpand e) i P -> P Pe prima ramura e bine ca detectarea tiparului se face la parcurgerea in jos, nu are sens sa mai prelucram subformula care dispare prin conjunctie cu fals si pe a doua ramura (e   T = e) e suficient apelul recursiv care va simplifica subformula e Pe a treia ramura insa pierdem ocazii de simplificare Apelata cu And(V , And(B true, В true)), pe nivelul doi de apel recursiv, functia simplifica And(B true, В true)) la В true Dar fara a cauta tiparul si la revenire, nu se mai simplifica si rezultatul And(V , В true) Rescriem deci functia ca sa verifice tiparul cautat dupa ce subformulele au fost deja simplificate: i And ( , В false) | And (B false, ) -> В false i And (el, e2) -> ( (simpand el, simpand e2) i (e, В true) | (B true, e) -> e i ( , В false) | (B false, ) -> В false i (el, e2) -> And (el, e2)) i Neg e -> Neg (simpand e) i P -> P Acum obtinem simplificarea completa pana la V Am scris si functii pentru evaluarea unor formule (aritmetice in cursul 2, apoi la cursul 6 pentru formule boolene data fiind o atribuire interpretare) Acestea sunt cazuri particulare de simplificare pana la o singura valoare Codul urmeaza aceeasi secventa: evalueaza intai subformulele, si apoi combina rezultatele (aici, o simpla operatie pe intregi sau booleni) 6 Prelucrare cu dependente intre subformule in acest caz, prelucrarea subformulelor nu mai e independenta Aceasta se poate intampla cand dorim prelucrarea intr-o anumita ordine Rezultatul pentru o subformula poate deveni parametru de calcul pentru alta subformula De exemplu, pentru a numerota fiecare propozitie, transmitem ca parametru ultimul numar folosit inainte de a prelucra formula, si returnam ultimul numar folosit dupa prelucrarea ei Functia data tipareste numarul dat fiecarei propozitii si returneaza ultimul, deci numara propozitiile 2 Pentru un operator binar, rezultatul pentru subformula clin stanga el devine parametru la calculul pentru formula clin dreapta e2 i V s -> = n+1 Printf printf i Neg e -> cntp n e i And (el, e2) -> cntp (cntp n el) e2 cntp 0 nl s; nl invers, putem numara fiecare operator Pentru o propozitie, numarul dat ca parametru e si cel returnat Operatorul curent il putem numara inainte sau dupa ce numaram operatorii clin subformule: n = i V s -> n i Neg e -> cntop (n+1) e i And (el, e2) -> cntop (cntop (n+1) el) e2 cntop 0 n = i V s -> n i Neg e -> 1 + cntop n e i And (el, e2) -> 1 + cntop (cntop n el) e2 cntop 0 in oricare clin cazuri, am putea folosi in cadrul unei expresii compuse (Neg, And) valorile returnate pentru subexpresii, claca problema o cere 3 Prelucrari pe tipuri recursive Expresiile de diverse feluri (aritmetice, boolene) sunt printre cele mai familiare exemple de recursivitate O expresie e fie o expresie atomica (care nu mai poate fi descompusa: numar, propozitie logica), fie obtinuta aplicand un operator unor subexpresii (operanzi) Evaluarea unei expresii, chiar cu structura complicata: (3 + 9)   ((5 - 3) * (7 - 4)), se face dupa o regula simpla: identificam ultimul operator de aplicat (aici impartirea  ), evaluam cei doi operanzi (expresii mai simple) si aplicam operatorul Cele scrise (evaluam expresia evaluand intai subexpresiile) ne arata ca evaluarea e un procedeu inerent recursiv, urmarind structura expresiei Cazul de baza e o expresie simpla care nu contine operatori (in exemplu, un numar) Lucram in continuare cu un tip recursiv pentru formule prepozitionale: bform = V string i Neg bform i And bform * bform Folosim un singur operator binar, fiind suficient oricum pentru a exprima orice formula, si pentru ca prelucrarile date ca exemplu depind doar de structura formulei, nu si de intelesul (semantica) ei intrucat tipul formula e definit recursiv, orice prelucrare (netriviala) a unei formule ein mod necesar recursiva, pentru ca trebuie sa descompuna formula conform definitiei Spunem ca astfel de prelucrari lucreaza prin descompunere structurala Prelucrare fara rezultat, independenta intre subformule Acestea sunt printre functiile cele mai simple - de exemplu tiparirea Prelucrarea fiecarei subformule se face independent; apelurile pentru mai multe subformule se fac unul dupa celalalt i V s -> print string s i Neg e -> print string ; print form e; print char ’)’ i And (el, e2) -> print char print form el; print char print form e2; print char ’)’ Prelucrare cu rezultat, independenta intre subformule Valoarea functiei depinde doar de formula, fara alti parametri Rezultatul se obtine combinand valorile pentru subformule, fara ca ele sa se influenteze reciproc De exemplu, numarul de operatori clin formula: i V s -> 0 i Neg e -> 1 + count ops e i And (el, e2) -> 1 + count ops el + count ops e2 Similar, adancimea unei formule, adica numarul maxim de operatori "sub" care se afla o propozitie (care se aplica acesteia) O propozitie are adancime 0 (nu are operatori) Fiecare operator adauga 1 la adancime; pentru un operator binar, luam in considerare adancimea maxima dintre cei doi operanzi i V s -> 0 i Neg e -> 1 + maxd e i And (el, e2) -> 1 + max (maxd el) (maxd e2) Prelucrare cu parametru suplimentar, independenta intre subformule Variind problema de mai sus, tiparim adancimea fiecarei propozitii Transmitem un parametru care numara adancimea curenta (numarul de operarori parcursi deja), si e incrementat la fiecare apel: d = i V s -> print int d; print char ’ ’; print endline s i Neg e -> prl (d+1) e i And (el, e2) -> prl (d+1) el; prl (d+1) e2 prl 0 1 Prelucrare cu parametru si rezultat, independenta intre subformule in acest caz, transmitem informatie atat in jos, printr-un parametru, cat si in sus, ca rezultat De exemplu, dorim sa calculam suma adancimilor fiecarei propozitii clin formula d = i V s -> d i Neg e -> sumd (d+1) e i And (el, e2) -> sumd (d+1) el + sumd (d+1) e2 sumd 0 Desigur, ca pentru orice functie, rezultatul poate avea doua componente Putem astfel calcula adancimea medie, returnand si numarul de propozitii impartirea e intreaga (cu rest), iar cum orice formula are propozitii, nu riscam sa impartim la zero f = d = i V s -> (1, d) i Neg e -> avd (d+1) e i And (el, e2) -> ( , sl) = avd (d+1) el (nl + n2, sl + s2) ( , s) = avd Of s   n (n2, s2) = avd (d+1) e2 Prelucrare cu dependente intre subformule in acest caz, prelucrarea subformulelor nu mai e independenta Aceasta se poate intampla cand dorim prelucrarea intr-o anumita ordine Rezultatul pentru o subformula poate deveni parametru de calcul pentru alta subformula De exemplu, pentru a numerota fiecare propozitie, transmitem ca parametru ultimul numar folosit inainte de a prelucra formula, si returnam ultimul numar folosit dupa prelucrarea ei Functia data tipareste numarul dat fiecarei propozitii si returneaza ultimul, deci numara propozitiile Pentru un operator binar, rezultatul pentru subformula clin stanga el devine parametru la calculul pentru formula clin dreapta e2 i V s -> = n+1 Printf printf i Neg e -> cntp n e i And (el, e2) -> cntp (cntp n el) e2 cntp 0 nl s; nl invers, putem numara fiecare operator Pentru o propozitie, numarul dat ca parametru e si cel returnat Operatorul curent il putem numara inainte sau dupa ce numaram operatorii clin subformule: n = i V s -> n i Neg e -> cntop (n+1) e i And (el, e2) -> cntop (cntop (n+1) el) e2 cntop 0 n = i V s -> n i Neg e -> 1 + cntop n e i And (el, e2) -> 1 + cntop (cntop n el) e2 cntop 0 in oricare clin cazuri, am putea folosi in cadrul unei expresii compuse (Neg, And) valorile returnate pentru subexpresii, claca problema o cere 2 O functie nu poate returna intr-un caz o valoare de un tip si alteori o valoare de alt tip Limbajele C si ML sunt static tipizate: tipul valorii asociate unui identificator (nume) poate fi determinat static, la compilare, inainte de a rula programul in C, tipurile trebuie date explicit (pentru variabile, parametri, rezultate de functie) in ML, ele pot fi deduse de compilator Tipizarea statica ne obliga ca valoarea returnata de o functie sa fie in toate cazurile de acelasi tip in C, tipul e declarat inaintea numelui functiei in ML, de exemplu, daca am scris pe o ramura i П -> 0 inseamna ca functia trebuie sa returneze intotdeauna un intreg, si nu putem scrie pe alta ramura | h : : t -> h : : f (n-l) t, pentru ca aici se returneaza o lista Valoarea returnata de o functie trebuie sa fie de tipul cerut in problema Problemele de examen cereau sa se returneze: o valoare dintr-o lista, un numar de elemente dintr-o lista, o lista, o pereche de liste, sau doar sa tipareasca Toate acestea sunt tipuri diferite Unii din voi au scris pentru trei probleme diferite acelasi tipar: | []  > [], sau i [] -> 0 Dupa cum am discutat inainte, asta determina tipul functiei in toate cazurile O functie inceputa cu primul tipar va returna intotdeauna o lista, deci e incorecta pentru o problema care nu cere liste Numarul de elemente si un element nu pot fi comparate pentru ca lista s-ar putea sa nu fie de numere Valoarea O inseamna "nimic" si e unica valoare de tipul unit Este valoarea returnata de toate functiile de tiparire (folosite pentru efectul lor, nu pentru valoarea returnata) O functie cu tiparul i □->() trebuie sa returneze () in toate cazurile, deci nu va avea vreo valoare utila O exceptie si un mesaj de eroare nu sunt acelasi lucru Unele functii nu au valori definite pentru toate argumentele Functia List hd care da primul element al unei liste nu poate returna o valoare pentru lista vida in aceste cazuri, functia genereaza o exceptie (in cazul dat, Failure "hd", unde Failure e o exceptie predefinita, cu un argument sir) Exceptia incheie executia functiei, fara a returna un rezultat, si poate fi tratata in fragmentul de program care a apelat functia A returna sau a tipari un mesaj de eroare nu este acelasi lucru, ci reprezinta in continuare un rezultat, care determina tipul functiei (in toate cazurile) Daca scriem i П > "eroare lista vida", functia e obligata sa returneze tot timpul un sir Scriind i П > print string "eroare lista vida", functia poate returna doar () (rezultatul tiparirii) in ambele cazuri, functia nu mai poate returna pe cealalta ramura primul element al listei Pentru aceasta ne trebuie exceptiile, care se pot folosi indiferent de tipul functiei Scriem: i П > raise (Failure "lista vida"), sau folosim functia faiwith care genereaza aceeasi exceptie: i П  > failwith "lista vida" in limbajele functionale pure nu avem atribuiri Unii din voi au scris cod de genul -> cnt = cnt + 1; preorder cnt , incercand sa modifice cnt prin atribuire in limbajele functionale pure nu exista si nu avem nevoie de atribuire Ceea ce vrem e sa folosim mai departe o noua valoare Pentru aceasta, e suficient sa dam noua valoare ca parametru: preorder (cnt + 1) Daca valoarea respectiva va fi folosita de mai multe ori, putem sa ii dam un nou nume: cntl = cnt + 1 printf "° "d" cntl; preorder cntl O relatie nu e doar o pereche O relatie binara intre A si В e o submultime R С А x В a produsului cartezian Ax B Deci, R e o multime de perechi, (a, b) e o pereche, nu o relatie, dupa cum 5 e un intreg, nu o multime de intregi, (a, b) poate fi un element dintr-o relatie: (a, b) G R, uneori notat R(a, b), si chiar singurul element din acea relatie: R = {(a, &)} Dar nu putem scrie R = (a, b) R e o multime, (a, b) e o pereche Reflexivitatea, simetria, tranzitivitatea sunt definite pentru relatii binare pe o multime (A = B) (c, b, a) e doar tripletul (a, b, c) scris in ordine inversa Nu are nicio legatura cu o relatie binara (si deci nici cu una simetrica, chiar pe o multime de 3 elemente), pentru ca relatia binara contine perechi Exista multimi infinite numarabile si nenumarabile Multimea numerelor naturale e infinita si numarabila (e chiar punctul de referinta pentru a defini multimi numarabile) Multimea numerelor reale e infinita si nenumarabila Afirmatii de genul: "Multimea e infinita, deci nenumarabila", sau "Orice multime este finita este si numarabila" pun grave semne de intrebare privind abilitatea de a rationa logic 1 O afirmatie nu se demonstreaza doar enuntand definitia Considerati afirmatiile (din lucrarile de examen): a) Multimea vectorilor de intregi e numarabila pentru ca are cardinalul egal cu cardinalul unei submultimi a numerelor naturale b) Multimea vectorilor de intregi e nenumarabila pentru ca nu are cardinalul egal cu cardinalul unei submultimi a numerelor naturale "Are cardinalul egal cu " si "nu are cardinalul egal cu " sunt doua afirmatii (contradictorii) Ele sunt facute luand definhia unei multimi numarabile si afirmand fara niciun argument ca in acest caz conditia din definitie e indeplinita (respectiv nu) Niciuna din afirmatii nu e demonstrata Nu avem nicio baza care sa sustina vreuna din ele Pentru un alt exemplu, putem afirma pentru numarul n=1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139 : (a) n e compus, fiindca e produs de doi factori > 1; (b) n e prim, fiindca nu e produs de doi factori > 1 Va convinge vreuna din afirmatii? Putet spune care din ele e adevarata ? Ambele afirmatii incearca sa aplice o definitie (pentru numar compus, resp prim) Nicicare afirmatie nu demonstreaza ca e adevarata conditia din definitie Pentru (a) ar trebui scris efectiv n ca produs a doua numere Pentru (b) ar trebui verificat ca niciun numar intre 1 si n nu divide pe n in fapt, avem n=37975227936943673922808872755445627854565536638199x40094690950920881030683735292761468389214899724061 in problema din examen, multimea vectorilor de numere intregi e intr-adevar numarabila Unii au argumentat ca numaram intai vectorii de lungime 1, apoi cei de lungime 2, etc intuitia e partial buna, dar deja vectorii de lungime 1 sunt in numar infinit (corespund la multimea numerelor intregi), deci nu putem saa-i enumeraam pe toti intai Putem gaasi insaa o altaa ordine de numaarare bazataa pe aceeasi intuitie de a numara intai vectorii "mici": pentru un vector v = (ai, a2,      , ak) definim o masura care tine cont si de lungimea vectorului, si de marimea elementelor: m(v) = k + |a1| +       +  ak| Exista un numaar finit de vectori de maasuraa n: lungimea si valoarea absolutaa a fiecaarui element sunt cel mult n Deci putem enumera toti vectorii de intregi in ordine crescatoare a masurii O alta varianta ar fi sa demonstram intai ca vectorii de k intregi sunt numarabili, ceea ce se face usor prin inductie, asimiland un vector de lungime k + 1 cu o pereche dintr-un vector de lungime k si incaa un numaar, si aplicand acelasi argument ca la numerele rationale Apoi avem o insiruire numaarabilaa (dupa lungimea k a vectorului) de multimi numarabile, care e numarabila (folosind aceeasi constructie diagonalaa ca pentru numerele rationale) Echivalenta e o implicatie an ambele sensuri A: Oricine a picat un examen a trecut un examen B: Nimeni nu a picat toate examenele Multi au afirmat: "Cele douaa afirmatii sunt echivalente pentru caa dacaa cineva a picat un examen atunci a si trecut unul deci nu a picat toate examenele " Asta arata (informal) doar ca A ! B Pentru ca afirmatiile sa fie echivalente, trebuie aratat si B ! A Doua afirmatii NU sunt echivalente pentru ca implica acelasi lucru Unii au afirmat "Propozitiile sunt echivalente deoarece ambele implica aceeasi concluzie" Acesta e un rationament fals! Cu aceasta "regula" de deductie am putea demonstra ca doua afirmatii arbitrare sunt echivalente, deoarece fiecare din ele implica true ! Ca un alt exemplu, fie afirmatiile x = 2 si x = 3 Ambele implica faptul ca x e pozitiv Evident, ele nu sunt echivalente, dimpotriva, se contrazic! O definitie trebuie sa fie precisa O definitie trebuie saa fie neambiguaa, trebuie saa fie clar si precis ce inseamnaa, ce corespunde definitiei si ce nu Pentru aceasta, definitia trebuie exprimataa riguros, matematic, folosind notiunile de bazaa invatate: multimi, functii, relatii, formule logice, etc Nu putem spune doar "o interpretare e incercarea de a da un inteles unei formule" Nici "un automat e ceva care trece din stare in stare" Nu e suficient nici macar sa spunem "are o functie de tranzitie", e esential sa definim care e forma acelei functii: daca S : S x E ! S, automatul e determinist (pentru orice pereche de stare si simbol de intrare, starea urmatoare e unic determinata de functia de tranzitie); daca d : S x E ! P(S), e nedeterminist (pentru orice stare si intrare, functia permite o multime de stari urmatoare); chiar si masina Turing are o functie de tranzitie si "trece din stare in stare", dar nu e un automat finit 2 Lucrand mecanic, riscati sa va indepartati de solutie Forma normala conjunctiva are afara Л si inauntru V Aplicand distributivitatea a Л (b V c) = (a Л b) V (a Л c) insa transforma o formula care e in forma dorita intr-o formula care nu mai e! Deci, daca avem b ! (a Л (c ! d)) = :b V (a Л (:c V d)) vrem sa distribuim pe V, nu pe Л: (:b V a) Л (:b V :c V d), si am terminat! Altfel, ne complicam (si riscul de a gresi creste): :b V (a Л (:c V d) = :b V ((a Л :c) V (a Л d)) = ((:b V a) Л (:b V c)) V (a Л d) = (((:b V a) Л (:b V c)) V a) Л (((:b V a) Л (:b V c)) V d) = (:b V a V a) Л (:b V c V a) Л (:b V a V d) Л (:b V c V d) Am obtinut o forma normala conjunctiva, dar e mult mai complicata Uitandu-ne cu atentie, vedem ca prima clauza :b V a e inclusa in a doua, :b V c V a si in a treia, :b V a V d, deci prin absorbtie (p Л (p V q) = p) putem elimina clauzele 2 si 3, ajungand, mult mai greu, la acelasi rezultat ca inainte Gramaticile genereaza doar siruri de terminale O problema cerea sa se scrie sirurile de lungime 3 (bAAS, din care fiecare neterminal va genera cel putin o literaa), iar pentru sirurile neterminal raamase aAS si bAc saa remarcaam caa trebuie saa ne limitaam la productiile care dau o singuraa literaa, obtinand aac si bac O solutie mai sistematicaa e saa trataam fiecare simbol in parte: S genereazaa AS, apoi AAS, AAAS, etc deci AnS cu n > 1 Singura productie care se opreste e S ::= c, deci obtinem S = Anc, cu n > 0 Similar, A genereaza bA, bbA, bbbA, , deci bnA cu n > 1 Combinand cu varianta de oprire A ::= a obtinem A = b*a (un limbaj regulat) Cum A are lungime cel putin 1, in S = Anc trebuie ca n i, , i>") and v' = {ѵ'г, , v'n) A path in the transition graph is defined as a sequence of States such that N(vi, uitT) is true for every i > 0 in addition, we define a set of initial States, and all computations are performed on States reachable from this set A set of States can also be represented by a Boolean formula which evaluates to true if and only if its variables are assigned the values of the variables in a state in the set Note that if S(v) is a formula representing a set of States and N(v, v1) is a formula for the transition relation, the formula 3v[S(v)  N(v, t )] represents thesetofsuccessors toStates in S(y) This operation can be thought of as a function mapping a set of States S(v) to the set of its successors We use binary decision diagrams (BDDs) to effi-ciently represent Boolean formulas and to manipulate them using the standard Boolean operations Because of the close relationship between a Boolean formula, its BDD, and the set of States satisfying the formula, we identify these three entities in particular, sets and set operations are more intuitive than boolean operations on formulas or BDD operations so we present our algorithms using sets, but the implementation uses BDDs and the corresponding BDD operations 3 Quantitative Timing Algorithms We first present the lower bound algorithm (figure 1) The algorithm takes two sets of States as input, start and final it returns the length of (i e number of edges in) a shortest path from a state in start to a state in final if no such path exists, the algorithm returns infinity The function T(S) gives the set of States that are successors of some state in S The function T, the sets of States R and R’, and the operations of intersection and union can all be easily implemented using BDDs The first algorithm is relatively straightforward intu-itively, the loop in the algorithm computes the set of States that are reachable from start if at any point, we encounter a state satisfying ina , we return the number of steps taken to reach the state Next, we consider the upper bound algorithm (figure 2) This algorithm also takes start and final as input it returns the length of a longest path from a state in start to a state in final if there exists an infinite path beginning in a state in start that never reaches a state in final, the algorithm returns infinity The function gives the set of States that 267 proc lower (start, final) i = 0; R =starf, R' = T(R) U R; while (R'  Rf RCi final — 0) do i = i + 1; R = R'-, R> = T(R') U R’; if (R D final fi 0) then return i; else return oo; Figure 1: Lower Bound Algorithm are predecessors of some state in S' We also denote by not final the set of all States that are not in final As before, the algorithm is implemented using BDDs proc upper (start, final) i = 0; R =TRUE-, R' =notfinal; while (R1 fi RN R1 П start fi 0) do г = i + 1; R= R'; ri т’-і(д') Q not final', if(R=R') then return oo; else return i; Figure 2: Upper Bound Algorithm The upper bound algorithm is more subtle than the pre-vious algorithm A backward search from the States in notfinal is more convenient in this case than a forward search Proofs of both algorithms can be found in We have also developed algorithms that calculate the minimum and the maximum number of times a specified condition cond can hold on a path from a set of starting States to a set of final States For this purpose, we define a new state-transition system, in which the States are pairs consisting of a state in the original system and a positive integer, denoting the number of States in cond that have been traversed on such a path Thus, if the original state-transition graph has state set S, then the augmented state set will be Sa — S x ГЧ The augmented transitionrelation Лга C Sa x Sa is defined in terms of the original transition relationN C Sx Sby incrementingtheintegercomponent k whenever a state in cond is traversed Na({s,k),{s',k')) = N(s, s') Л (s'   cond Л k' — k + 1 V s' cond   k! = k) The algorithms use the augmented transition relation and the value of the counter component к to produce the desired information We have applied the same technique to a more powerful model ofreal-time systems, timed transition graphs , in which the time taken by a transition is defined by a time interval These extensions can also be found in 4 Example — An Aircraft Control System One of the most criticai applications of real-time systems is in aircraft control it is extremely important that time bounds are not violated in such systems Because of the risks i n voi ved i n the fai 1 ure of an aircraft, only conser va-tive approaches to design and implementation are routinely used Many modern techniques for software design such as formal methods are not commonly employed We believe that formal verification can be very useful in increasing the reliability of these systems by assisting in the validation of schedulability and response times of the various compo-nents This section briefly describes an aircraft control system used in military airplanes Such a control system can be characterized by a set of sensors and actuators connected to a central processor This processor executes the software to analyze sensor data and control the actuators Our model describes this control program and determines whether its timing constraints are met The requirements used are similar to those of existing military aircraft, and the model is derived from the one described in The aircraft controller is divided into systems and sub-systems, each of which performs a specific task in control-ling the airplane: • Navigation: Computes aircraft position • Radar Control: Receives and processes data from radars it also identifies targets and target position • Radar Warning Receiver: This system identifies possible threats to the aircraft • Weapon Control: Aims and activates aircraft weapons • Display: Updates information on the pilot’s screen • Tracking: Updates target position Data from this system is used to aim the weapons • Data Bus: Provides communication between processor and externai devices Timing constraints for each subsystem are derived from factors such as required accuracy, human response charac-teristics and hardware requirements The following table presents the subsystems being modelled, as well as their major timing requirements in order to enforce the different timing constraints of the processes, priority scheduling is used The priority assignment has been done according to the RMS theory Concurrent processes are used to implement each subsystem With the exception of the weapon system, all other systems contain only periodic processes The weapon system contains a mixture of periodic and aperiodic processes it is activated when the display keyset subsystem identifies that the pi lot has pressed the firing button This event causes the weapon protocol subsystem to be activated it then sig-nals the weapon aim subsystem that has been previously 268 System Subsystem Per Exec %cpu Pri Display status update 200 3 1 50 12 keyset 200 1 0 50 16 hook update 80 2 2 50 graph displ 80 0 11 25 store update 200 1 0 50 "20" RWR contact mgmt 25 5 20 00 72 Radar target update 50 5 10 00 60 track nlter 25 2 8 00 NAV nav update 50 8 16 00 56 steer cmds 200 3 1 50 24 Track target update 100 5 5 00 32 Weapon weapon prot 200' 1 0 50 28 weapon aim 50 3 6 00 "54" weapon rel 200' 3 1 50 "58" Dat Bus poli device 40 1 2 50 68 * Weapon protocol is an aperiodic process with a deadline of 200ms * * Weapon release has a period of200ms, but its deadline is 5ms Subsystem dead line Execution limes preempt no preempt min max min max Weapon release 5 3 3 3 9 Radar track filter 25 2 5 2 10 Contact mgmt 25 7 10 7 15 Data bus poli 40 1 11 1 14 Weapon aim 50 10 14 2 18 Radar target upd 50 12 19 12 19 NAV update 50 20 34 20 27 Display graphic 80 10 44 10 43 Display hook upd 80 14 46 14 47 Track target upd 100 26 51 26 51 Weapon protocol 200 1 21 3 46 NAV steer cmds 200 35 85 36 74 Display store upd 200 36 95 37 97 Display keyset 200 37 96 38 98 Display status upd 200 40 99 41 101 blocked Weapon aim is then scheduled to be executed ev-ery 50ins it aims the aircraft weapons based on the current position of the target it also decides when to fire and then starts the weapon release subsystem The firing sequence can be aborted until weapon release is scheduled, but not after this point Weapon release then executes periodically and fires the weapons 5 times, once per second 5 Verification of the Aircraft Control System We have implemented this control system in the SMV language The SMV model checker has been used to verify its functional correciness, while its timing correct-ness has been checked using the quantitative algorithms described in this paper in order to optimize response time, we have implemented a preemptive scheduler However, preemptability is a feature that may not always be avail-able Non-preemptive schedulers are easier to implement, and allow for simpler programs but usually increase response time for higher priority processes To assess the effect of preemption in our system we have also implemented a non-preemptive scheduler Using the model described above, we were able to compute the schedulability of the system This is one of the most important properties of a real-time system it States that no process will miss its deadline in this example the deadlines are the same as the periods (except for the weapon release subsystem) We determine schedulability by computing the minimum and maximum execution times for each process and checking if they always finish before their deadline The RMS theory checks for schedulability by computing the CPU utilization of the process set it may not provide any schedulability information if the utilization exceeds a certain threshold Our method however, is always able to determine schedulability Moreover, it only requires that processes be modelled as state graphs, while RMS imposes restrictions on their behavior The following table summarizes the execution times computed by our algorithms for both the preemptive and non-preemptive schedulers Processes are shown in de-creasing order of priority We can see from this table that the process set is schedulable using preemptive scheduling An analysis of a similar process set using RMS showed that only the first eight processes were guaranteed to meet their deadlines From our results we can also identify many important parameters of the system For example, the response time is usually very low for best-case computations, but it is also good for the worst case Most processes take less than half their required time to execute This indicates that the system is still not close to saturation, although the total CPU utilization is high Notice also that preemption does not have a big impact on response times Except for the most criticai process, all others maintain their schedulability if a non-preemptive scheduler is used Moreover, we can see that non-preemption causes weapon release to miss its deadline, but by a relatively small amount if a preemptive scheduler were expensi ve, reducing the CPU utilization slightly might make the complete system schedulable without changing the scheduler By having such information, the designer can easily assess the impact of various altematives to im-prove the performance, without having to change the im-plementation it should be noted that an analysis of this type can’t be done using methods like the RMS utilization test or reachability computation The algorithms described can be used to analyze the system in many different ways For example, the effect of preemption on execution time can be assessed as follows We have computed the maximum and minimum execution times for processes after they have been granted the CPU if minimum and maximum are not the same, the process can be preempted after starting execution For example, the display graphic subsystem can finish in as little as 7ms 269 and in as much as 14ms after it starts execution in other words, preemption overhead can be as high as 7ms for this subsystem The NAV steering subsystem has a minimum of ims and a maximum of 44ms This means that other processes can delay it for 43ms it is clear that NAV steering can be preempted for a longer time than display graphic, since it has lower priority Our results, however, allow us to determine how much longer it can be preempted in a similar fashion, we can compute the priority inversion time for high priority processes This can aid in identifying the reasons why a system is not predictable, and help correct its behavior We examine one more property of this particular model The weapon system is criticai to the aircraft it is very important that it respond quickly to the pilot’s command However, when a pilot presses the firing button, many sub-systems are involved in identifying and responding to this event By computing the minimum and maximum times between pressing the fire button and the execution of the weapon release process we are able to determine if the weapon system responds quickly enough to satisfy the aircraft requirements in our example, the minimum time is 120ms and the maximum time is 167ms, not accounting for the possibility that the firing sequence may be aborted Again, this type of analysis may be difficult to do with other tools The RMS schedulability test cannot give tight bounds on specific response times for such properties, since its only parameter is CPU utilization Algorithms that use reachability analysis are also inappropriate for such analysis Specific exceptions, with previously defined time bounds, would have to be added to the model to observe these characteristics The finite-state model was implemented in about 600 lines of SMV code The final model has about iO15 States, and the transition relation uses approximately 4600 BDD nodes To compute each property described above took between 5 and 15 seconds using an І486 based workstation 6 Conclusion This paper proposes a general framework for computing quantitative characteristics of finite-state real-time systems We have devised algorithms that calculate exact numerical bounds on the delay between two specified events, as well as on the frequency of the occurrence of a condition within a given interval Rather than just determining the correciness of the model, the results computed by our algorithms pro-vide hints about its behavior that can be useful in improving the performance of the system Our method can be easily integrated with model checking techniques in fact, the lower and upper bound algorithms have been added to the most recent version of the SMV model checking system Using this implementation we demonstrate the practicai importance of our approach by analyzing a model of an aircraft control system We have been able to obtain stronger results than those produced using traditional methods for real-time system verification We have found this approach to be very flexible We have shown how quantitative characteristics can be computed for state-transition graphs in addition, we have ex-tended the algorithms to models in which transitions may take more than one time unit We also plan to investigate the application of these techniques to other models of computation We believe that the quantitative information that our method provides can be extremely useful to designers dur-ing the development of real-time systems We are confident that these techniques will prove practicai in the verification of a variety of other realistic designs References R AiurandT A Henzinger Logicsandmodelsofreal-time: a survey in Lecture Notes in Computer Science, Real-Time: Theory in Practice Springer-Verlag, 1992 R E Bryant Graph-based algorithms for boolean function manipulation iEEE Transactions on Computers, C-35(8), 1986 J R Burch, E M Clarke, K L McMillan, D L Dill, and J Hwang Symbolic model checking: iO20 States and beyond in LiCS, 1990 S V Campos and E M Clarke Real-time symbolic model checking for discrete time models in First AMAST international Workshop in Real-Time Systems, 1993 S V Campos, E M Clarke, W Marrero, M Minea, and H Hiraishi Computingquantitative characteristicsof finite-state real-time systems Technical Report CMU-CS-94-147, Camegie Mellon University, 1994 E A Emerson, А К Мок, A P Sistla, and J Srinivasan Quantitative temporal reasoning in Lecture Notes in Computer Science Springer-Verlag, 1990 A N Fredette and R Cleaveland RTSL: a language for real-time schedulability analysis in iEEE Real-Time Systems Symposium, 1993 R Gerber and i Lee A proof system for communicating shared resources in iEEE Real-Time Systems Symposium, 1990 J P Lehoczky, L Sha, J K Strosnider, and H Tokuda Fixed priority scheduling theory for hard real-time systems in Foundations of Real-Time Computing — Scheduling and Resoutce Management Kluwer Academic Publishers, 1991 C L Liu and J W Layland Scheduling algorithms for multiprogramming in a hard real-time environment Journal of the ACM, 20(1), 1973 C D Locke, D R Vogel, and T J Mesler Building a predictable avionics platform in Ada: a case study in iEEE Real-Time Systems Symposium, 1991 K L McMillan Symbolic model checking — an approach to the state explosion problem PhD thesis, SCS, Camegie Mellon University, 1992 L Sha, M H Klein, and J B Goodenough Rate monotonie analysis for real-time systems in Foundations of Real-Time Computing — Scheduling and Resource Management Kluwer Academic Publishers, 1991 270 Safety interfaces for Component-Based Systems Jonas Elmqvist1, Simin Nadjm-Tehrani1, and Marius Minea2 1 Department of Computer and information Science, Linkoping University {jonel, simin}@ida liu se 2 "Politehnica" University of Timisoara and institute e-Austria Timisoara nariusscs utt ro Abstract This paper addressestheproblems ;q>pai riim, in diii'igt, wilii a built-in declaration ofdtsie dghnvidhr idVea daulVsinassumed ennida^msnte This component modei eafi^rer iiie iogieof hha hesign at e high rbelsu-liitii level, and could be ;-i[iplied So software or |revonfigurddle) hardwere devigvoi Traditional risk assasement tadhniqaetsuch as Feuli-lree ;m;>lyils (FTA) aed Failure modes and cffevte cnaiwvis (FiMEA, [1і^> deel with inoeVectsfiade-pendent faults Althoushevsiasinv l;mllloleiamev oteystemisvel ierni iiii[op-tant part of safety ;т;|| те, vigotourmetlioilv ci only io Dole iiidiiisy whan R Winther, B A Gran, ord G DahllfEdsg: SAFESOMO t005, griS iSSg da SSS5 © Springer-Verlag Berlin Heidelberg 2005 Safety interfaces for Component-Based Systems 247 it comes to systems with significant digital components Our goal is to provide a formal means to support the system integrator When acquiring a new component for inclusion into a system, the integrator is informed whether the component can potentially threaten the system-level safety (in the same spirit as FMEA) The integrator is also supported in analysis of fault tolerance at system level, the result of which will indicate all single orniiilliple component-level faults that will necessarily lead to violation of safety (in the same vein as FTA) Unlike functional correciness analysis, here llie goal is tofocus isi risks associated with extcrualfsulta, oot to slimiuateCesign Satilts The contributioneoCthispaper amin UoSlowa Weprssant a sosisoisse neuSt that includes safety iotcafaces Uhece desiurlie how asompnnent behavet with respect to a given system-level safsty psop osiimn;dly aSoau saiuto iu proueioe oltiiole and multiple faults bl igetem invul isv rsSerringtoineaafetyinteraacns Once tit' relevant fault-failure rhaios ace riig^rushe i^el^naffierl, thay can ba hnudled ming standard assessmeni rostuicu, lini ir foregaslicg cnO поит i i iii ioni tecfoaques 1 1 Related Work To our knowledge, llrso is na piuvious S)lellalmcU nn ts0syleds4;u  l in g to lower service levsls that mayie jar^stieuler be due tu St 1 ute tsinariof |t9] Jiirjens defmes ;ni exleiisioiiof' the UML syetax iawhichrleraetyues, tugs, and values can be usgd l oeepl nre tgi liisr iienk't of'iniiipoiiaiils int systeni (tors ruption, delay, loss) S Oht merit of themodel is So ternew tge nap tetween g system realised as a set of Ounctioos and a system rinlisedns s est et gnmyonogSt Li et al defint Гогіііігі'-огіеіііеП infsrfanesfot inotelse thaf eneapruiate crosscutting system ргоіісгііс^ ТІіе foeun gl etos wnUr fs OenSute SnOeinetidn including features thaf Oilrodiieesi iieivvitubnlny A recent approach fof femei fseaSment ofcrosreuttmg COTicsmuin roorn-hgurable componente is given by Terciioi'ic utaL [П] wSere extredrC iiioeO automata are used inaripl nre modiisof' eoiio>onenif witii te mreeface foo elgii-acterising the esselCitllnces rOTsutuortine rfSv 'lal nгtse antomaSsformeliemsiHsreithe notion of refmement ir urualSanrate liicditioii bnl sao Мво be simulation [O7]e Our rules are derived frnm tiruse otA 11 ir and i iniziuger tor reactive motulei o 2 Components dnd LauR Medete A component is an iios'p г mtesfacxs ore nnly Oho1 ioiirdok'dniiig input and output poiisril a sdeitocticieveL i'io i'^tйtsnl sefetoanolyrrr aSsysSem level, these simple interfocet nre msuffieient Mare belidvisur тГотіаСіоп msst be provided to make interfaces usable for analysis of failures in presence of faults We propose a formal component moddwiili iwo Mma^ts: its functional behaviour and a safety interface, which describes the behaviour in presence of faults in the envirddment У'nis saSefymSdrfoee cae idisnb o perfetin safety analysis at rysSsmicvei, гіісіі ar inolyois fas (tlllll tito'olaul variables of the modult Too exosution of a monulopaoducoo a etate tosgooai' q = qo qn A trace a is ibe eotrespnnding tingiio ce o^bservatigoi ro[>e0y p woit ten M |= p, A ah raeesof' M be ong to This work focuses on sofeiy ueeperSbes [ai,ta] oe oppooeP to iiaetiess=ropbrtter Composing two modules iutoa sinplemodule heeutes а іалѵ lonUiUo whote behaviour captures tlie interaci ion between thf compensat modelet Definition 2 (Paraffea composfiinn) Le= M = (у^ТеП^11) s^ns N = (VN, Qq, hN) be two m odulos TithVy AVdOri = u Tpeparal lei rtmpot tioa of M and N, denoted by M H-V, is defined as - V" = VM A vp vp vp - vo = ѵом и voN - Vi = (ум и vyi ftre r - s C QctrixQiXDctriwh0^ Qpipf) e Sif^qyy^], (ЮпЛ^], аКм]) e  d (уѵуу (і u am, q'iKooji & oN rh the resulting transitrnn tron 0 te nonblocking, i e , has a next state for any combination of current otffe onl iiiphls in this case, we caii the two modules i ir ih o We reiate modules vi a tra:e se ma ntics: a mod iile M re fin es n mndnl e (V if iV has more behaviours than M, dh , a 11 oossidlh tract of M ere also trcces of N Definition 3 (Refinementi Pet M= iVм, Q^, 5M) aud N =(VN ,=q =8Nl tL (2) voNbs c vys andfe) os^n^]   dteU (^tFg іѵііісіі paoduces the faultyoutpntuito ЛЛ WemugolOhls (grigallass acampaaition of Fj and E, which has thesame variables as E and can tim be composed with M Free inputs to M are viewed as unconstrained outputs of E Definition 5 (Composition with Fault) Luth te a modulawith Vj l Vh and Fj a fault mode with output v and input vi Denote FjoE = Fj || E[dj vd] where E[vj vl} is themаdole E with ahe uaaiabla tubelitelion at fosiij Our fault modegaaaunree ricted sna cgnaffectotoir f1[sa in ad M’t^^td^of° way Other types aMaeRonades cam be "io гоіоіісо of а gisnursi оГГпііН eWs ee^etipe ikear eeiiecc faults Next, we use the obtained environment abstraction to determine more restrictive environments underwhich themodule isresilient, first to a chosen set of single faults and then for the occurrence of fault pairs 3 1 Generating e silankeа ahin^l^ap^nhhuашfnulk if M is a module sucii Shak M ie ^rj^ikei wnakete (FasS ггіігісПпсГ ennifonment Ef, in order to satisfy canbe eviioioiiedsis ahowninFigurn SlTheoluhtienm 252 J Elmqvist, S Nadjm-Tehrani, and M Minea counter-example Fig 1 The abstraction algorithm uses a model checker to check whether the module M in parallel with an environment E satisfies the safety property cp; i e M || E |= p initially, the algorithm starts out with an empty constraint E° on the environment and at each iteration i, the algorithm strengthens the constraints E'1 by analysing the counter-example generated by the model checker and removing the forbidden States This corresponds to removing behaviourfrom (or strength-ening) the environment in the next iteration, the environment Ег+1 should at least not exhibit thebehaviourreflected by the coiinler-oxaiiiple ati^t erat ioni The algorithm stopsoV ;tfixpoint when fP'' 1 Proposition 1 T ^rfv^-^inu^f^eht PhC get-dhitteiO by the algorrthm is tOe feast restrictive enviromnent in uiOich M satisifirstVej-reper^tytr T ates, 0or acit environment E, M fE  dcpiffE h Ef The proof can bedone bp toap0ovleh reasoningby Hdhhwochset ol |oV- that synthesise a ne-hsterv and suffihient eiiviroiinie il Гог an 1 0 lentelle M 3 2 identification of FaultBehaviours Let M be a modulethat satisOes r srty property p when placed in an environment E, assuming the ideal case without faults: M || E |= p Let Fj be a fault mode on variable Vj whitp ison inpnl to M froni F Donoli ivL' tip E llio module with Vе' = fvh ^op|Piii  = Vij-QS'an1^ = Vi- V-i- 6E Proposition 2 if M || E |= cp andVvi E exists, then M || (Fj o Ѵгу F) |= p, i e , M is resilient to fault Fi in the environment Vr>j   By definition ofVvj F, any state can be extended io a state of F with an arbitrary value of Vi Thus, F( o Wj F , l Іачі V-Cj Ef, is the least restrictive emuronment in which neidule M is resilivnt to f;ioll This result gives ao environmentin whieh a module is eesiiiant to usirrri 'X|esill t=e etaje uwebod ie ascertain the sensitiFainot thnsytteF So stoieR (retpeeOivFly multiplei ftmltol 254 J Elmqvist, S Nadjm-Tehrani, and M Minea 4 1 General Setup Consider a system safety property cp and a component with safety interface SiV As delivered by the component provider, SiV specifies an environment in which the component is safe, assuming no faults; another environment in which the component is resilient to a set olsinglo ;uid a safe cuci rennicni for each considered pair of simultaneous faults Consider proving Mi || М2 || || Mn |= cp in the presence of a fault G inMi ifasafety interface SiV of Mi is known, with single = (Ff, Fp, and Fj e F , it suflices to show that М2 || • • • || Mn iit s' LfeSs s de from the PLD2 andoiisoii Os' iod elde l'ictiil iio iТЕСи Ве-У of theeesignnls need to be present le b>tges Sss thf vnive Sn closca in iSlseludy we only ecnsidor the three componente H-ECU,PLD1 aei PLD2 TLncl due Ootio bdnctienalife of the valves, the pit-e'iiy b ton bt rsplacet t>y igd ne foect than cet ea ec should receive signals m ietO LDehith sleit act OVs  ets sideat the oases timy Fig 2 The hydraulic leakage detection system 5 3 Analysis of Fault Tolerance Modules: PLD1, PLD"2 and HECU are represented as synchronous modules Fault modes: A set of fault modes  >1   F 'i i : and Fhecu for each component has been identified Every input to the components has been analysed and the possible faults have been modelled as correspondingfault inoclcs Safety interface generation: The least restrictive environments EpLD1, EpLD^,, Eppcu °f 1 components were generated by the algorithm of Section 3 1 using a SAT-based model checker (Prover plugin of the Esterel environment) Safety interfaces for Component-Based Systems 257 The least restrictive environment of PLD1 that makes the system satisfy fa leaves all the inputs to PLD1 unconstrained By Prop 2, PLD1 in the environment EpLDi is also resilient to all faults in Fpldi Analysis shows that due to their fault-tolerant design, HECU and PLD2 satisfy the property fa with no constraints on their environment whatsoever, i e , EpLD2 = Eeecu = True Since none of EpLD1, tfa,Ul2 and Eeecu constrain anyoftheinput variables of their corresponding component, these componente are resilient to all single faults Hence, the siiigAhiiilia'silioiaa' set of eachsafety interface will contam every fault mode in the corrosiionvling (tiillinodo sel Thegchetaiehminimal environments also tfeeallsllbc coai pooenOc ara eoeiliint lo all Ooebletaufts, creating a safety inteoface thae includesafl pairs offaults ie tWe Couble feult resilience portion ofthe eotety intcrfaca Single-component faulta: Aftcr aoaipalliis the sacefy inttafaces fos tcetetoe loim ponents in the appliaetion (w r t siiiolfifiid daubla faultet the single compotieie fault analysis becomes Urmat Nc sinfleor doritOt Іеііііоианііітіо tomponant will cause a threatfe syetgm-leveisafcty, tince cO tcglts ate maluded li lt it single fault resilienta ѵогііоѵ ovut ull pairs ot tcultsaremflvdeain llie doubie fault resilience portion oftlre satecy mlerlteia Multiple-cornponentfaulfa: By checking dj Mt || F^ oht fs t^jfor tU шоРпіо-fault pairs (MpFfct wOiuo Mt o {FL DtiFLFd, HECU} apd Fe, o Fptpi U Fpldz U Fhecu we couiF coufeudeinat nt dovble canii oi inppteigoeia wocid таке a threat to syse vaall|i|ai 05'i Spring is modskn Mwrnateona1 iseenalof  oftwars PooP for Technology Transfer Odii-blrrl soie d-D -4,2004 Spriogtt VeoRo 16 E Henley and H Kumamsto Reliabil-eo Engmeerin enp Rjsp Atmetmea2, Pros-tice Hali, 1981 17 T A Henzinger, S OncEno0 K ilajainoin end S Ta n'ini Anao4inii b тailllllal >Sl iit inolingand aii;dyzhie,coiiipon t is represented by using two sets of variables, one set for the current state and another set for the next state Each variable in the next state set corresponds to one variable in the current state set if s is represented by the formula fs over the current state variables, and t is represented by the formula f over the next state variables, then the transition s —r t is represented by fs Л ft For example, a transition from state (a, b,c) to state (a,b,c) is represented by the formula Л -tb Л ->c Л - c' The transition relation of a graph is constructed from the disjunction of all transitions in the graph The meaning of the formula representing the transition relation is the following: there 84 5 Campos et ai i Science of Computer Programming 29 (1997) 79-98 exists a transition from state s to state t iff the substitution of the variable values for 5 in the current state variables and of those of t in the next state variables of the transition relation yields true in the same way as boolean formulas can represent sets of States, they can also represent sets of transitions Symbolic model checking takes advantage of this fact by grouping sets of transitions into a single formula, which often significantly simplifies traversing the graph The clustering of transitions happens automatically when boolean formulas are implemented using BDDs This occurs because of the canonicity of BDDs: given a fixed variable ordering, a boolean formula is represented by a unique BDD Therefore, the order in which the transition relation is constructed does not affect the final result, the canonicity property guarantees that the same transitions will be clustered according to the formulas that represent them This technique is one of the main reasons for the efficiency of symbolic algorithms 2 3 Computation tree logic The properties to be verified by the model checker are expressed in computation tree logic, CTL Computation trees are derived from state transition graphs The graph structure is conceptually unwound into an infinite tree rooted at the initial state Paths in this tree represent all possible computations of the program being modelled Formulas in CTL refer to the computation tree derived from the model CTL is classified as a branching time logic, because it has operators that describe the branching structure of this tree Formulas in CTL are built from atomic propositions (in our method, each proposition corresponds to a state variable in the model), boolean connectives -• and Л, and temporal operators Each operator consists of two paris: a path quantifier followed by a temporal operator Path quantifiers indicate that the property should be true of all paths from a given state (A), or some path from a given state (E) The temporal operators describe how events are ordered with respect to time for a path specified by the path quantifier They have the following informai meanings: • F (p (cp holds sometime in the future) is true of a path if there exists a state in the path that satisfies (p • G cp (cp holds globally) is true for a path if cp is satisfied by all States on the path • X cp ( AF ack): A request is always followed by an acknowledge 8’ Campos et al i Science of Computer Programming 29 (1997) 79-98 85 • AG(re AF  ip2 readable) for each pair of caches pl and p2 The proposition pl writable is true when pl is the only cache that has a valid сору of the cache line Similarly, p2 readable is true when p2 has one of possibly many valid copies S Campos et al   Science of Computer Programming 29 (1997) 79-98 87 proc minimum (start, final) i = 0; R =start; R' = T(R) U R; while (R1 fi R Л R O final — 0) do proc maximum (start, final) i = 0; R =TRUE; R' =not final; R’ = T(R,)uR>; if (R П final fi 0) while (R'   R Л R' П start -fi- 0} do i = i + 1; R = R!; R' — T fiR') П not final; then return t; else return oo; if (R = R') then return ce; else return г; i — i И- 1; R = R'; Fig 2 Minimum and maximum delay algorithms Consistency is described by requiring that if two oaches have copies of a cache line, then they agree on the data in that line: AG(pl readableЛp2 readable -> pl data = p2 data) Similarly, if memory has а сору of the line, then any cache that has а сору must agree with memory on the data: AG(p readable Л-im memory line modified -> p data = m data) The variable m memory line modified is false when memory has an up-to-date сору of the cache line The last property expresses that it is always possible for a cache to get read or write access to the line: AG EF p readable A AG EF p writable Several errors have been found in this analysis that were not previously known For example, one counterexample showed an execution trace in which one processor had a cache line in the shared unmodified state, while a second one had the same cache line in the exclusive modified state Another error showed a deadlock in the hierarchical configuration Several different configurations have been verified, the largest one with three bus segments, eight processors, and over iO30 States 4 Quantitative algorithms This section presents algorithms used for quantitative analysis and performance eval-uation of models First we describe algorithms that compute the minimum and maximum time delays between specified events Then we show algorithms that determine the minimum and maximum number of times a given condition holds on any path from a set of starting States to a set of final States Both algorithms have been used in the example presented subsequently 88 5 Campos et al i Science of Computer Programming 29 (1997) 79-98 4 1 Minimum and maximum delay algorithms in order to simplify the presentation, we must make some assumptions All computa-tions are performed on States reachable from a predefined set of initial States We also assume that the transition relation is total We consider the minimum delay algorithm first (Fig 2) The algorithm takes two sets of States as input, start and final it returns the length of (i e number of edges in) a shortest path from a state in start to a state in final if no such path exists, the algorithm returns infinity The function T(S) gives the set of States that are successors of some state in S (it is computed using image computation , as discussed above) The function T, the state sets R and R', and the operations of intersection and union can all be easily implemented using BDDs The first algorithm is relatively straightforward intuitively, the loop in the algorithm computes the set of States that are reachable from start if at any point, we encounter a state satisfying final, we retum the number of steps taken to reach that state Next, we consider the maximum delay algorithm This algorithm also takes start and final as input it returns the length of a longest path from a state in start to a state in final if there exists an infinite path beginning in a state in start that never reaches a state in final, the algorithm returns infinity The function 7’ i(S ) gives the set of States that are predecessors of some state in S' (i e T l(S') = {s | N(s,s') holds for some s' e S'} ) We also denote by not-final the set of all States that are not in final As before, the algorithm is implemented using BDDs; however, a backward search is required in this case 4 2 Condition counting algorithms in many situations we are interested not only in the length of a path from a set of starting States to a set of final States, but also in measures that depend on the number of States on the path that satisfy a given condition For example, we may wish to determine the minimum (maximum) number of times a given condition holds on any path from starting to final States Both algorithms in this section take as input three sets of States: start, cond and final The algorithms compute the minimum and the maximum number of States that belong to cond, over all finite paths that begin with a state in start and terminate upon reaching final To guarantee that the minimum (maximum) is well-defined, we assume that any path beginning in start must reach a state in final in a finite number of steps This can be checked using the maximum delay algorithm described in the previous section Finally, we ensure that all computations involve only reachable States, by intersecting start with the set of reachable States computed a priori To keep track at each step of the number of States in cond that have been traversed, we define a new state-transition system, in which the States are pairs consisting of a state in the original system and a positive integer Thus, if the original state-transition graph has state set 5, then the augmented state set will be Sa = S' x M S Campos et al i Science of Computer Programming 29 (1997) 79-98 89 proc mincount (start, cond, final) current-min - oo; R = {(s, 1) | s € start A cond} U {(s, 0) | s € start A cond}; loop Reached-final = R A Final; if Reached-final   0 then m = min{a? [ (s,k) € Reached-final}; ifm AF GNT) AG(start transaction -> AF end transaction) The properties above show that the response time of PCi transactions is bounded, but they give no indication of their performance We will use the algorithms described in Sections 4 1 and 4 2 to determine the response time for transactions The results of our quantitative analysis also determine the correciness of the algorithm, for example, a transaction always finishes if its maximum response time is less than infinity in our performance analysis we will follow the structure of the protocol by computing the response time for each phase of the transaction separately in this way we can S Campos et al i Science of Computer Programming 29 (1997) 79-98 93 Bus Master Arbitration Bus acquisition Total bus acquisition Target Total transaction min max min max min max min max min max iSA bridge 1 95 1 18 2 113 1 2 2 18 SCSi 1 95 1 18 2 113 1 2 2 18 Video 1 38 1 18 2 56 1 2 2 18 Processor 1 38 1 18 2 56 1 2 2 18 Fig 7 Response times for global round-robin policy have a better understanding of the behavior of the protocol By computing the latency of each phase we are able to assert the efficiency of each step in the protocol and obtain the global behavior by adding individual figures Results will be grouped into two categories, total bus acquisition latency and total transaction latency The first category corresponds to the total time between a request being made on the bus and the subsystem actually being able to use the bus The second category represents the total usage of the bus, that is, the time between asserting the FRAME signal until the end of data transfer Fig 7 shows the response times when the arbitration policy is set to round-robin in all banks and transaction cancelling is not allowed Notice that in all cases discussed in this paper the latency for the data transfer phase varies between 1 and 16 clock cycles, there is no overhead associated with it For that reason, this column will not be shown in the tables From Fig 7 we can see two interesting properties of the system The total transaction latency is at most 18 clock cycles, and in this case 16 clock cycles of data are transmitted This means that once a master is able to use the bus, it can send data very efficiently Another characteristic of the protocol is reflected on the bus acquisition times The maximum of 18 cycles corresponds to one transaction After being granted the bus the new master may have to wait for at most one more transaction to complete This shows that once the bus is granted to a master, it will not be granted to another before the first one issues its transaction Therefore no starvation can occur after a master is granted the bus This property can be verified by AG(GNT -> A[GNT U FRAME]) A more intriguing result can be seen in the arbitration latency results The first two subsystems can take almost twice as long to access the bus as the others in a round-robin environment, all subsystems should be granted equal usage of the resource, but this is not true in our example By analyzing the execution traces produced by our tools we are able to determine the reason for the unfair access to the bus The problem arises from the connection of the request lines to the arbiter as seen in Fig 8 The iSA bridge and the SCSi controller are connected together to bank 0, while the video and the processor subsystems are alone in their banks if bus traffic is high, the iSA bridge and the SCSi subsystems may have to wait for one another before their request reaches bank 2 Subsequently, they may have to wait for subsystems connected to the other banks to execute before being granted the bus in other words, they compete in both levels of arbitration, while the other subsystems only compete in the last level This 94 S Campos et al   Science of Computer Programming 29 (1997) 79-98 Fig 8 The PCi arbiter causes the worst-time latency to be approximately twice as long for these subsystems We can conclude from these results that two-level arbitration may have a different behavior than an equivalent one level arbiter in this case the problem is caused by an asymmetric connection of request lines We can also use these results to analyze the overhead imposed by the communication protocol on the transaction time We have already seen that after asserting the FRAME signal there is an overhead of 2 clock cycles This overhead is independent of the transfer size if a transaction is allowed to transfer more than 16 cache lines of data at once, the total utilization of the bus will increase The designers of the bus can use this information to determine which is the best transfer size for a given system The foliowing two formulas have been used to verify the above statements: AG(FRAME -> AF^state = DATAJCFER)) AG( (state = DATAJCFER) -> A[state = DATA XFER U end transaction] ) The first formula States that at most two cycles after the transaction starts, it will enter the data transfer phase The second formula States that once a transaction is in the data transfer phase, it will continue in this phase until its end The overhead associated with arbitration can be computed in a similar way it is more complex, however, because the arbitration latency depends not only on the transaction time, but also on the number of active request lines We use the condition counting algorithms to uncover more details about this problem We compute the number of transactions issued on the bus between the time a master requests access and the time it is granted the bus Up to 5 transactions can be issued during this period for the iSA bridge and the SCSi subsystems, and up to 2 transactions can be issued for the video and processor subsystems Total transaction time for each of these intermediate transactions is 18 clock cycles By comparing the total effective data transfer time with the maximum arbitration time, we can see that each intermediate transaction has an arbitration time of one clock cycle These results are also valid for the video and processor subsystems We can conclude that the arbitration latency can be computed S Campos et al   Science of Computer Programming 29 (1997) 79-98 95 Bus Master Arbitration Bus acquisition Total bus acquisition Target Total transaction min max min max min max min max min max iSA bridge 1 19 1 18 2 37 1 2 2 18 SCSi 1 ОС 1 18 2 ОС 1 2 2 18 Video 1 00 1 18 2 ОС 1 2 2 18 Processor 1 oo 1 18 2 ОС 1 2 2 18 Fig 9 Response times for global fixed priority policy by the formula: Arbitration Latency = n * (Transaction Latency + 1), where n is the maximum number of intermediate transactions that can be issued between a request and the corresponding grant (computed with the condition counting algorithms) This formula does not depend on maximum data transfer size The above results assume a global round-robin policy The behavior of the system under a fixed priority arbitration policy has also been studied and the results can be seen in Fig 9 The iSA bridge is the highest priority subsystem on the bus its response time is much lower in the fixed priority configuration than in the round-robin one However, all other subsystems may starve, since the iSA bridge can continuously issue transactions Notice that the arbitration time, but not the transaction time, is affected by the arbitration policy These response times can be used by the designer to check if the performance of the PCi bus is adequate for a criticai application Other combinations of arbitration policies are possible, but are not presented here for the sake of brevity The model described above allows a detailed analysis of the behavior of the PCi bus protocol Some features of the actual bus, such as parity or data width, have been abstracted from our model, since they do not affect the timing of transactions However, there are other features that do affect timing such as the possibility of a transaction being cancelled Errors on the bus may occur, the target may be slow, or unable to produce the data For example, a transaction requesting data from the iSA bus will most likely experience a long delay, simply because of the relative speeds of the iSA and PCi buses in the model described above this feature has been abstracted out by the assumption that the target of a transaction responds immediately A more realistic model that allows transactions to be cancelled has also been implemented in order to account for long delay responses and aborted transactions we introduce the concept of transaction cancellation in our model Transactions may be cancelled any time they are in progress Transaction cancellations model the fact that in the actual PCi bus whenever a target is unable to answer for a long time, it aborts the transaction, which is reissued later We model this situation by cancelling the transaction and restarting it immediately by issuing another request However, reissuing the transaction immediately would not correctly model the response time of a very slow target To accommodate this situation, in our model a cancelled transaction is restarted as many times as necessary to accommodate the target response time Using the algorithms described we compute the overhead caused by cancelling and restarting a transaction, 96 5 Campos et al i Science of Computer Programming 29 (1997) 79-98 Bus Master Arbitration Bus acquisition Total bus acquisition Target Total transaction min max min max min max min max min max iSA bridge 1 95 1 18 2 113 1 6 2 132 SCSi 1 95 1 18 2 113 1 6 2 132 Video 1 38 1 18 2 56 1 6 2 75 Processor 1 38 1 18 2 56 1 6 2 75 Fig 10 Response times for global round-robin policy, maximum one cancel and use this result to determine the number of retries for the response delay of a given target Moreover, unlimited cancellations may cause starvation Therefore, in order to compute the worst time response, we must limit the number of cancellations allowed A cancellation brings the bus to the idle state, as can be verified by the following CTL formula: AGCABDRT -> AX BUS-iDLE) As a consequence, consecutive cancellations have the same behavior, because a cancellation brings the system into the same state as before the transaction Therefore, the total overhead caused by n cancellations is n times the overhead of a single cancellation Therefore, it suffices to consider the situation in which at most one cancellation occurs The results for a global round-robin arbitration policy in the presence of at most one transaction cancellation are presented in Fig 10 in this figure we can see that arbitration latency is not affected by transaction cancellations The reason is that whenever a transaction is cancelled the current bus master releases the bus and becomes last in the round-robin queue On the other hand, total transaction latency increases significantly The execution trace of the transaction with the worst latency shows the following sequence of events (for the iSA bridge subsystem): 1 A transaction starts but is cancelled just before completion, after 17 clock cycles 2 Another request is made to complete it in the next cycle (one extra clock cycle) 3 An arbitration sequence of 79 cycles follows 4 A bus acquisition phase starts, taking 17 clock cycles 5 The transaction starts again, completing in 18 cycles The arbitration sequence appearing in item 3 is the same as in the worst-case, except that the request is made when the bus is already idle because of the cancellation The difference of 16 clock cycles corresponds to one maximum data transfer phase done by another bus master, as shown by the counterexample for the worst-case arbitration latency (not presented for brevity) The total delay caused by the first three items is the equivalent of a worst-case arbitration latency plus two clock cycles, caused by the cancellation A bus acquisition phase and a transaction latency phase, in which no cancellation occurs, account for the last 35 cycles We can see then that the overhead imposed by a transaction cancellation consists of a worst-case arbitration latency, a maximum bus acquisition phase, a maximum transaction latency (without S Campos et al   Science of Computer Programming 29 (1997) 79 98 97 cancellations) and one extra clock cycle Again, this formula applies for the video and processor subsystems These results may be used to estimate the performance of an implementation of the PCi in the presence of transaction aborts The formula derived gives the overhead for one transaction cancellation, and can be extended to many cancellations as well in this manner, the worst response time in various configurations of the system can be computed To summarize our results, we have been able to: • Model the PCi Local bus protocol and verify its correciness in the round-robin case no starvation of subsystems occur, and transactions always finish, even in the presence of limited cancellations • Determine the minimum and maximum latencies for each phase of the protocol, and show which phases are affected by changes in the parameters (such as arbitration policy and presence of cancellations) • Compute response times independent of specific values for the data transfer phase • Determine response time in the presence of limited transaction aborts using the condition counting algorithms described These results allow the designers of the protocol to understand its actual behavior and how this behavior changes when parameters of the system are modified We believe that this is valuable information when verifying and optimizing a new hardware system This example shows that our method can be used to analyze the performance of modem hardware designs that have very complex behavior it can help improve the reliability of new products and increase the efficiency of the design process 7 Conclusion Model checking is a well established technology for formal verification Using this method we have been able to verify systems of industrial complexity, such as the Futurebus+ cache coherence protocol This analysis discovered errors on the protocol that were not known before We have also extended model checking techniques to allow a quantitative analysis of models as well as performance evaluation This paper presents algorithms to compute minimum and maximum path lengths as well as the minimum and maximum number of times an event occurs on all paths from a set of start States to a set of final States The analysis of the PCi Local bus demonstrates the power of this technique The PCi is a high-performance bus design used in most Pentium processor based systems By analyzing its performance we have shown that our techniques can be used in complex industrial designs The measurements produced by these algorithms can be used to analyze design decisions before the system is actually implemented in the PCi bus example, the description of the hardware can easily be modified to model different arbitration policies and different data transfer sizes This flexibility allows designers to fine-tune system parameters in order to maximize efficiency 98 S Campos et al i Science of Computer Programming 29 (1997) 79-98 The method presented can help determining the correciness of computer systems, as well as evaluate their performance it is versatile enough to enable several types of analysis to be performed, and efficient enough to be used in complex modem industrial designs We believe they can be of significant help in designing correct applications, as well as in reducing costs of the development process References R E Bryant, Graph-based algorithms for boolean function manipulation, iEEE Transactions on Computers C-35 (8) (1986) 677-91 J R Burch, E M Clarke, K L McMillan and D L Dill, Sequential circuit verification using symbolic model checking, in: Proc 27th ACM iEEE Design Automation Conference, 1990, 46-51 J R Burch, E M Clarke, K L McMillan, D L Dill and J Hwang, Symbolic model checking: iO20 States and beyond, in: Proc 5th Annual iEEE Symposium on Logic in Computer Science, 1990, 403-7 S V Campos, The priority inversion problem and real-time symbolic model checking, Technical Report CMU-CS-93-125, Camegie Mellon University, 1993 E M Clarke, E A Emerson and A P Sistla, Automatic verification of finite-state concurrent systems using temporal logic specifications, ACM Trans Programm Languages Systems 8 (2) (1986) 244 - 263 E M Clarke, O Grumberg, H Hiraishi, S Jha, D E Long, K L McMillan and L A Ness, Verification of the Futurebus + cache coherence protocol, in: L Claesen, ed , internat Symp on Computer Hardware Description Languages and their Applications North-Holland, April 1993 E A Emerson, A K Мок, A P Sistla and J Srinivasan, Quantitative temporal reasoning, Lecture Notes in Computer Science, Voi 531 (Springer, Berlin, 1990) 136-45 iEEE Computer Society, iEEE Standard for Futurebus1—Logical Protocol Specification, March 1992 iEEE Standard 896, 1-1991 intel Corporation, 82378 System i O (S1O) - PCi Local Bus, 1993 intel Corporation, PCi Local Bus Specification, 1993 K L McMillan, Symbolic Model Checking (Kluwer Academic Publishers, Dordrecht, 1993) 19 mai Programarea calculatoarelor Curs 12 2006 Marius Minea Fisiere 2 Ca , de calculatoare, ne referim la un fisier prin Ca , ne intereseaza accesul la continutul fisierului, un sir (flux) de octeti (engl stream) in stdio h: tipul cu elementele necesare accesului la fisier (pozitia curenta in fisier, tamponul de date, indicatori de eroare si EOF), in program, lucram cu variabile transmise functiilor pt fisiere, (nu le dereferentiem niciodata, le folosim doar pt a indica fisiere) Secventa tipica de lucru: se deschide, se prelucreaza, se inchide fisierul Fisiere standard predefinite (deschise automat la rularea programului): : fisierul standard de intrare (normal: tastatura) : fisierul standard de iesire (normal: ecranul) : fisierul standard de eroare (normal: ecranul) (sunt constante de tipul file * declarate in stdio h) De fapt, scanf printf etc fac citire scriere (de) la stdin stdout Obs: E bine ca mesajele de eroare sa fie scrise la stderr, pt a putea fi separate (prin redirectare) de mesajele normale de iesire Programarea calculatoarelor Curs 12 Marius Minea Fisiere 3 FiLE *fopen (const char *path, const char *mode); - arg 1: numele fisierului (absolut sau fata de directorul curent) - arg 2: modul de deschidere; primul caracter semnifica: : deschidere pentru citire (fisierul trebuie sa existe) , : deschidere pt scriere; daca fisierul nu exista, e creat; daca exista, e trunchiat la 0 (w) sau se adauga la sfarsit (append, a) in plus, sirul de caractere pt modul de deschidere mai poate contine: permite si celalalt mod (r w) in plus fata de cel din primul caracter deschide fisierul in mod (implicit: in mod ) - returneaza null in caz de eroare (trebuie testat !!!) - altfel, valoarea returnata se foloseste pt lucrul in continuare int fclose(FiLE *stream); - scrie orice a ramas in tampoanele de date, inchide fisierul - returneaza 0 in caz de succes, EOF in caz de eroare Programarea calculatoarelor Curs 12 Marius Minea Fisiere 4 Tipar pt lucrul cu fisiere (ex deschis pt citire si scriere in mod text) FiLE *fp; char *name = "f txt";  * sau din argv[], sau solicitat *  if (! (f p = fopen(name, "rt+"))) {  * trateaza eroarea *  } else {  * lucreaza cu fisierul *  } if (fclose(fp))  * eroare la inchidere * ; La intrarea-iesirea in mod se pot petrece diverse conversii in functie de implementare (de exemplu traducere  n in  r n pt DOS) - modul : doar pt fisiere cu caractere tiparibile obisnuite ,  t,  n - modul binar  pt toate celelalte situatii (chiar si pt fisiere text) (asigura corespondenta exacta intre continutul scris si citit) Citirea si scrierea intr-un fisier folosesc , care e avansat automat de fiecare operatie => trebuie repozitionat corespunzator indicatorul cand trecem intre citire si scriere in acelasi fisier Pentru un fisier deschis in mod dual (cu +), nu se va citi direct dupa scriere fara a goli tampoanele (ffiush) sau a repozitiona indicatorul; nu se scrie direct dupa citire fara repozitionarea indicatorului sau EOF Programarea calculatoarelor Curs 12 Marius Minea Fisiere 5 Cu functii echivalente celor folosite pana acum: int fputc(int c, FiLE *stream);  * scrie caracter in fisier *  int fgetc(FiLE *stream);  * citeste caracter din fisier *   * gete, pute: la fel ca si fgetc, fputc, dar sunt macrouri *  int ungetc(int c, FiLE *stream);  * pune caracterul c inapoi *  int fscanf (FiLE *stream, const char *format, ); int fprintf(FiLE *stream, const char *format, ); int fputs(const char *s, FiLE *stream);  * scrie un sir *  int puts(const char *s);  * scrie sirul si apoi  n la iesire *  - citeste pana la (inclusiv) linie noua, sau max size - 1 caractere, adauga ’ 0’ la sfarsit => citirea sigura a unei linii, fara depasire returneaza null daca apare EOF inainte de a fi citit ceva Programarea calculatoarelor Curs 12 Marius Minea #include void cat(FiLE *fi)  * afiseaza un fisier deschis car { int c; while ((c = fgetc(fi)) ! void main(int argc, char *argv[]) FiLE *fp; if (argc == 1) cat(stdin);  * c else while (—argc > 0) {  * pt if (!(fp = fopen(*++argv, "r" fprintf(stderr, "can’t open else { cat(fp); fclose(fp); } Programarea calculatoarelor Curs 12 б acter cu caracter *  = EOF) putchar(c); } iteste de la intrare *  fiecare argument *  )))  * deschide, testeaza *  ° os", *argv);  * afiseaza, inchide *  Marius Minea void clearerr(FiLE *stream); reseteaza indicatorii de sfarsit de fisier si eroare pentru fisierul dat int feof(FiLE *stream);  * != 0: ajuns la sfarsit de fisier *  int ferror(FiLE *stream);  * != 0 la eroare pt acel fisier *  Daca un apel de sistem a rezultat in eroare, se poate citi codul erorii din variabila globala extern int errno; declarata in errno h Se poate folosi impreuna CU functia char *strerror(int errnum) ; din string h care returneaza un sir de caractere cu descrierea erorii Se poate folosi direct functia void perror(const char *s) ;  *stdio h*  care tipareste mesajul s dat de utilizator, un : si apoi descrierea erorii void exit(int status) ; *stdlib h*  termina normal executia prog - se scriu tampoanele, se inchid fisierele, se sterg cele temporare - se returneaza sistemului de operare codul intreg dat (v int mainO Programarea calculatoarelor Curs 12 Marius Minea Pana acum: functii orientate pe caractere, linii, formatare (fisiere text) Pentru a citi scrie un numar de octeti, neinterpretati (in format ): size t fread(void *ptr, size t size, size t runemb, FiLE *stream); size t fwrite(void *ptr, size t size, size t runemb, FiLE *stream);  * citesc scriu runemb obiecte de cate size octeti *  Functiile intorc numarul obiectelor complete citite scrise corect Daca e mai mic decat cel dat, cauza se afla din feof si ferror Cu ele, putem sa ne scriem functii proprii pentru fiecare tip de date: size t readint(int *pn, FiLE *stream)  * in format binar *  { return fread(pn, sizeof(int), 1, stream); } size t writedbl(double x, FiLE *stream)  * in format binar *  { return fwrite(&x, sizeof(double), 1, stream); } fprintf(fp, "7od", n); scrie intregul ca sir de cifre zecimale cu fwrite se scrie intregul in format binar (sizeof (int) octeti Programarea calculatoarelor Curs 12 Marius Minea Fisiere 9 #include #include #define MAX 512  * copiem cate un sector odata *  int filecopy(FiLE *fi, FiLE *fo) { char buf[MAX]; int size;  * nr octeti cititi *  while (!feof(fi)) { size = fread(buf, 1, MAX, fi);  * citeste MAX octeti *  fwrite(buf, 1, size, fo);  * scrie doar cati s-au citit *  if (ferror(fi) || ferror(fo)) return errno; return 0; Programarea calculatoarelor Curs 12 Marius Minea Fisiere 10 void main(int argc, char *argv[]) FiLE *fi, *fo; if (argc != 3) { fprintf(stderr, "usage: сору source destination n"); exit(l); } else { if (!(fi = fopen(argv , "rb"))) { fprintf (stderr, "70s: can’t open ° os: ", argv , argv[l]); perror(NULL);  * am scris deja mesajul * ; exit(errno); if (!(fo = fopen(argv , "wb"))) { fprintf (stderr, "70s: can’t open ° os: ", argv , argv ); perror(NULL); exit(errno); if (filecopy(fi, fo)) perror("Eroare la copiere"); if (fclose(fi) | fclose(fo)) perror("Eroare la inchidere"); Programarea calculatoarelor Curs 12 Marius Minea Pe langa citire scriere secventiala, e posibila pozitionarea in fisier: long ftell(FiLE *stream);  * pozitia de la inceputul fisierului *  int fseek(FiLE *stream, long offset, int whence);  * pozitionare *  Al treilea parametru: punctul de referinta pt pozitionarea cu offset: seek set (inceput), seek cur (punctul curent), SEEK END (sfarsit) void rewind(FiLE *stream);  * repozitioneaza indicatorul la inceput *  (echivalent CU (void)fseek(stream, OL, SEEK SET), plus clearerr Repozitionarea trebuie efectuata: - cand dorim sa "sarim" peste o anumita portiune din fisier - cand fisierul a fost scris, si apoi dorim sa revenim sa citim din el int fflush(FiLE *stream); scrie in fisier tampoanele de date nescrise pt fluxul de iesire stream Programarea calculatoarelor Curs 12 Marius Minea Fisiere 12 Functiile de tipul printf scanf pot avea ca sursa dest si siruri de char int sprintf(char *s, const char *format, int sscanf(const char *s, const char *format, Pentru sprintf, poate aparea problema depasirii tabloului in care se scrie, daca acesta nu e dimensionat corect (suficient) Se recomanda: int snprintf(char *str, size t size, const char *format, in care scrierea e limitata la size caractere => varianta sigura intre functii similare, trebuie alese cele corespunzatoare situatiei Ex: int n, r; char *s, *end; n = atoi(s);  * daca suntem siguri; nu semnaleaza erori *  n = strtol(s, &end, 10);  * se pot testa erori (s == end) si prelucra mai departe de la end *  r = sscanf (s, "° od", &n) ;  * se pot testa erori (r != 1) dar punctul de oprire in s nu e explicit (eventual cu ° on) *  Programarea calculatoarelor Curs 12 Marius Minea Fisiere Fisiere Programarea calculatoarelor Curs 12 19 mai 2006 Marius Minea Ca , de calculatoare, ne referim la un fisier prin Ca , ne intereseaza accesul la continutul fisierului, un sir (flux) de octeti (engl stream) in stdio h: tipul cu elementele necesare accesului la fisier (pozitia curenta in fisier, tamponul de date, indicatori de eroare si EOF), in program, lucram cu variabile transmise functiilor pt fisiere, (nu le dereferentiem niciodata, le folosim doar pt a indica fisiere) Secventa tipica de lucru: se deschide, se prelucreaza, se inchide fisierul Fisiere standard predefinite (deschise automat la rularea programului): : fisierul standard de intrare (normal: tastatura) : fisierul standard de iesire (normal: ecranul) : fisierul standard de eroare (normal: ecranul) (sunt constante de tipul file * declarate in stdio h) De fapt, scanf printf etc fac citi re ser ie re (de) la stdin stdout Obs: E bine ca mesajele de eroare sa fie scrise la stderr, pt a putea fi separate (prin redirectare) de mesajele normale de iesire Programarea calculatoarelor Curs 12 Marius Minea FiLE tfopen (const char +path, const char +mode); - arg 1: numele fisierului (absolut sau fata de directorul curent) - arg 2: modul de deschidere; primul caracter semnifica: : deschidere pentru citire (fisierul trebuie sa existe) , : deschidere pt scriere; daca fisierul nu exista, e creat; daca exista, e trunchiat la 0 (w) sau se adauga la sfarsit (append, a) in plus, sirul de caractere pt modul de deschidere mai poate contine: permite si celalalt mod (r w) in plus fata de cel din primul caracter deschide fisierul in mod (implicit: in mod ) - returneaza NULL in caz de eroare (trebuie testat ii!) - altfel, valoarea returnata se foloseste pt lucrul in continuare int fclose(FiLE tstream); - scrie orice a ramas in tampoanele de date, inchide fisierul - returneaza 0 in caz de succes, EOF in caz de eroare Programarea calculatoarelor Curs 12 Marius Minea Fisiere 4 Tipar pt lucrul cu fisiere (ex deschis pt citire si scriere in mod text) FiLE *fp; char *name = "f txt";  * sau din argv □ , sau solicitat *  if (!(fp = fopen(name, "rt+"))) f  * trateaza eroarea *  } else {  * lucreaza cu fisierul +  } if (fclose(fp))   + eroare la inchidere + ; La intrarea-iesirea in mod se pot petrece diverse conversii in functie de implementare (de exemplu traducere  n in  r n pt DOS) - modul : doar pt fisiere cu caractere tiparibile obisnuite ,  t,  n - modul binar', pt toate celelalte situatii (chiar si pt fisiere text) (asigura corespondenta exacta intre continutul scris si citit) Citirea si scrierea intr-un fisier folosesc , care e avansat automat de fiecare operatie => trebuie repozitionat corespunzator indicatorul cand trecem intre citire si scriere in acelasi fisier Pentru un fisier deschis in mod dual (cu +), nu se va citi direct dupa scriere fara a goli tampoanele (fflush) sau a repozitiona indicatorul; nu se scrie direct dupa citire fara repozitionarea indicatorului sau EOF Programarea calculatoarelor Curs 12 Marius Minea Fisiere Cu functii echivalente celor folosite pana acum: int fputc(int c, FiLE tstream);  * scrie caracter in fisier *  int fgetc(FiLE tstream);  * citeste caracter din fisier *   * gete, pute: la fel ca si fgetc, fputc, dar sunt macrouri *  int ungetednt c, FiLE tstream) ;  * pune caracterul c inapoi *  int fscanf (FiLE tstream, const char tformat, int fprintf(FiLE tstream, const char tformat, int fputs(const char *s, FiLE +stream);  + scrie un sir +  int puts(const char *s);   + scrie sirul si apoi  n la iesire +  - citeste pana la (inclusiv) linie noua, sau max size - 1 caractere, adauga ’ 0’ la sfarsit => citirea sigura a unei linii, fara depasire returneaza NULL daca apare EOF inainte de a fi citit ceva Programarea calculatoarelor Curs 12 Marius Minea Fisiere 6 #include void cat(FiLE +fi)  * afiseaza un fisier deschis caracter cu caracter *  { int c; while ((c = fgetc(fi)) != EOF) putchar(c); } void main(int argc, char *argvП) { FiLE *fp; if (argc == 1) cat(stdin);  * citeste de la intrare *  else while (—argc >0) {  * pt fiecare argument *  if (!(fp = fopen(*++argv, "r")))  * deschide, testeaza *  fprintf(stderr, "can’t open %s", +argv); else { cat(fp); fclose(fp); }   + afiseaza, inchide +  } } Programarea calculatoarelor Curs 12 Marius Minea Fisiere Fisiere Fisiere void clearerr(FiLE +stream); reseteaza indicatorii de sfarsit de fisier si eroare pentru fisierul dat int feof(FiLE tstream);  * != 0: ajuns la sfarsit de fisier *  int ferror(FiLE +stream);  + ! = 0 la eroare pt acel fisier +  Daca un apel de sistem a rezultat in eroare, se poate citi codul erorii din variabila globala extern int errno; declarata in errno h Se poate folosi impreuna CU functia char tstrerror(int errnum) ; din string h care returneaza un sir de caractere cu descrierea erorii Se poate folosi direct functia void perror(const char *s) ;  *stdio h*  care tipareste mesajul s dat de utilizator, un : si apoi descrierea erorii void exit(int status) ; *stdlib h*  termina normal executia prog - se scriu tampoanele, se inchid fisierele, se sterg cele temporare - se returneaza sistemului de operare codul intreg dat (v int mainO Programarea calculatoarelor Curs 12 Marius Minea Pana acum: functii orientate pe caractere, linii, formatare (fisiere text) Pentru a citi scrie un numar de octeti, ne interpretati (in format ): size t fread(void *ptr, size t size, size t nmemb, FiLE tstream); size t fwrite(void *ptr, size t size, size t nmemb, FiLE tstream);  + citesc scriu nmemb obiecte de cate size octeti +  Functiile intorc numarul obiectelor complete citite scrise corect Daca e mai mic decat cel dat, cauza se afla din feof si ferror Cu ele, putem sa ne scriem functii proprii pentru fiecare tip de date: size t readint(int *pn, FiLE tstream)  * in format binar *  { return fread(pn, sizeof(int), 1, stream); } size t writedbl(double x, FiLE tstream)  + in format binar +  { return fwrite(&x, sizeof(double), 1, stream); } fprintf(fp, "%d", n); scrie intregul ca sir de cifre zecimale cu fwrite se scrie intregul in format binar (sizeof (int) octeti Programarea calculatoarelor Curs 12 Marius Minea #include #include #define MAX 512  * copiem cate un sector odata *  int filecopy(FiLE +fi, FiLE *fo) { char buf EMAX] ; int size;  * nr octeti cititi *  while (!feof(fi)) { size = fread(buf, 1, MAX, fi);  * citeste MAX octeti +  fwrite(buf, 1, size, fo);  + scrie doar cati s-au citit +  if (ferror(fi) || ferror(fo)) return errno; } return 0; } Programarea calculatoarelor Curs 12 Marius Minea Fisiere 10 void main(int argc, char +argv []) { FiLE +fi, +fo; if (argc != 3) { fprintf(stderr, "usage: сору source destination n"); exit(l); } else { if (!(fi = fopen(argv , "rb"))) { fprintf(stderr, "%s: can’t open %s: ", argvEO], argvEH); perror(NULL);  * am scris deja mesajul * ; exit(errno); } if (!(fo = fopen(argv , "wb"))) { fprintf(stderr, "%s: can’t open %s: ", argvEO], argvE2]); perror(NULL); exit(errno); } if (filecopy(fi, fo)) perror("Eroare la copiere"); if (fclose(fi) | fclose(fo)) perror("Eroare la inchidere"); } } Programarea calculatoarelor Curs 12 Marius Minea Fisiere 11 Pe langa citire scriere secventiala, e posibila pozitionarea in fisier: long ftell(FiLE tstream);  + pozitia de la inceputul fisierului +  int fseek(FiLE tstream, long offset, int whence);  + pozitionare +  Al treilea parametru: punctul de referinta pt pozitionarea cu offset: SEEK SET (inceput), SEEK CUR (punctul curent), SEEK-END (sfarsit) void rewind(FlLE +stream);  * repozitioneaza indicatorul la inceput *  (echivalent CU (void)fseek(stream, OL, SEEK SET), plus clearerr Re pozitiona rea trebuie efectuata: - cand dorim sa "sarim'' peste o anumita portiune din fisier - cand fisierul a fost scris, si apoi dorim sa revenim sa citim din el int fflush(FiLE tstream); scrie in fisier tampoanele de date nescrise pt fluxul de iesire stream Programarea calculatoarelor Curs 12 Marius Minea Fisiere 12 Functiile de tipul printf scanf pot avea ca sursa dest si siruri de char int sprintf(char *s, const char *format, ); int sscanf(const char *s, const char tformat, ); Pentru sprintf, poate aparea problema depasirii tabloului in care se scrie, daca acesta nu e dimensionat corect (suficient) Se recomanda: int snprintf(char *str, size t size, const char *format, ); in care scrierea e limitata la size caractere => varianta sigura intre functii similare, trebuie alese cele corespunzatoare situatiei Ex: int n, r; char *s, +end; n = atoi(s);  + daca suntem siguri; nu semnaleaza erori +  n = strtol(s, &end, 10);  + se pot testa erori (s == end) si prelucra mai departe de la end +  r = sscanf(s, "%d", &n);  + se pot testa erori (r != 1) dar punctul de oprire in s nu e explicit (eventual cu 7"n) +  Programarea calculatoarelor Curs 12 Marius Minea 23 mai Programarea calculatoarelor Curs 13 2006 Marius Minea Tipuri definite de utilizator 2 Un tip defineste o multime de valori si operatiile posibile cu acestea Adeseori e nevoie de alte tipuri (mai complexe) decat cele de baza, in C se pot defini tipuri enumerare, structura si uniune cu sintaxa: cuvant cheie opt nume tip specificatie tip optlistadeclaratori ; unde cuvant cheie ::= enum | struct | union opt nume tip: pt referire ulterioara (prefixat cu enum, struct, union) opt lista declaratori: pot fi declarate obiecte de tipul respectiv - opt nume tip e intr-un spatiu de nume diferit de cel comun pentru variabile, tipuri si functii E mai clar sa folosim totusi nume diferite Programarea calculatoarelor Curs 13 Marius Minea Tipuri definite de utilizator 3 Folosite pentru a da nume simbolice unui sir de valori numerice Sintaxa: enum opt nume tip { lista constante } opt lista declaratori ; - constantele pot avea specificate valori (si o valoare se poate repeta) enum luni curs {ian=l, feb, mar, apr, mai, iun, oct=10, nov, dec}; - implicit, sirul valorilor e crescator cu pasul 1, iar prima valoare e 0 - acelasi nume de constanta nu poate fi folosit in doua enumerari diferite - tipurile enumerare sunt tipuri intregi => variabilele enumerare se pot folosi la fel cu variabilele intregi - cod mai lizibil decat prin declararea separata de constante int ore lucru ; enum zile sapt {D, L, Ma, Mc, J, V, S} zi; for (zi = L; zi e echivalent cu indirectarea urmata de selectie: pointer->nume camp e echivalent CU (*pointer) nume camp Operatorii si -> au precedenta cea mai ridicata, ca si () si □ Atentie la ordinea de evaluare ! p->x++ inseamna (p->x)++ ++p->x inseamna ++(p->x) *p->x inseamna *(p->x) *p->s++ inseamna *((p->s)++) Programarea calculatoarelor Curs 13 Marius Minea Tipuri definite de utilizator 8 in C, tipurile agregat pot fi combinate arbitrar (tablouri de structuri, structuri cu campuri de tip tablou, etc ) Tipurile trebuie definite in asa fel incat sa grupeze logic datele Ex : daca doua tablouri au acelasi domeniu pt indici si datele de la acelasi indice sunt folosite impreuna, e preferabila gruparea in structura: char* nume luna = { "ianuarie", char zile luna = { 31, 28, 31, 30,  * e preferabila varianta urmatoare *  typedef struct { char *nume; int zile; } tip luna; tip luna luni = { {"ianuarie", 31}, , "decembrie" }; ,30, 31 }; , {"decembrie", 31} }; Programarea calculatoarelor Curs 13 Marius Minea Tipuri definite de utilizator 9 Un camp al unei structuri nu poate fi o structura de acelasi tip (s-ar obtine o structura de dimensiune infinita nedefinita!) Poate fi insa adresa unei structuri de acelasi tip (un pointer)! => structuri de date recursive, inlantuite (liste, arbori, etc ) struct wl {  * o lista de cuvinte *  char *word; struct wl *next;  * informatia propriu-zisa *   * pointer la acelasi tip de structura *  Un arbore binar, avand in noduri numere intregi: typedef struct t tree;  * declaratie incompleta *  struct t { int val; tree *left, *right;  * foloseste numele din typedef *  Programarea calculatoarelor Curs 13 Marius Minea Tipuri definite de utilizator 10 Se pot declara campuri intregi cu un numar specificat de biti => Testarea setarea unor biti se face folosind direct numele campului fara a fi nevoie de definirea de masti si utilizarea unor operatori pe biti camp ::= tipJnt nume : int const ; | tipJnt : int const ; struct packet { int : 2;  * primii doi biti nu intereseaza *  int error: 1;  * un bit, semnalizeaza eroare *  int status: 3;  * un camp pe 3 biti *  int : 0;  * forteaza alinierea la octetul urmator *  int seq no: 4;  * numar de secventa pe 4 biti *  } pkt; if (pkt error) { } else if (pkt status == 5) { } else pkt seq no++; Programarea calculatoarelor Curs 13 Marius Minea Tipuri definite de utilizator 11 Agregate a caror valoare poate avea date de tipuri diferite, dupa caz Sintaxa: similara cu cea pentru structuri union opt nume tip { lista campuri } opt lista declaratori ; Lista de campuri este insa o lista de variante: - o variabila structura contine toate campurile declarate - o variabila uniune contine exact una din variantele date (dimensiunea tipului e data de cel mai mare camp) - o variabila uniune nu contine informatii despre varianta reprezentata - acest lucru trebuie memorat explicit in program (in alta variabila) Programarea calculatoarelor Curs 13 Marius Minea Tipuri definite de utilizator 12 Exemplu: un analizor lexical (prima faza a compilatorului) returneaza: - un cod intreg pt fiecare atom lexical (cuvant cheie, operator, etc ) - date suplimentare pentru identificatori (nume) si constante (valoare) enum tok { iDENT, iNUM, FNUM, DO, iF, , PLUS, , СОММА, typedef union { char *id;  * sir de caractere pentru identificator *  int ival;  * valoare pentru constanta intreaga *  float fval;  * valoare pentru constanta reala *  } lexvalue; enum tok token; lexvalue iv; switch (token) { case iDENT: printf ("° os", iv id); break; case iNUM: printf ("° od", iv ival); break; case FNUM: printf ("° of", iv fval); break; Programarea calculatoarelor Curs 13 Marius Minea Tipuri definite de utilizator Tipuri definite de utilizator 23 mai 2006 Un tip defineste o multime de valori si operatiile posibile cu acestea Adeseori e nevoie de alte tipuri (mai complexe) decat cele de baza, in C se pot defini tipuri enumerare, structura si uniune cu sintaxa: cuvant cheie optmumectip specificatie^tip optJista declaratori ; unde cuvant cheie ::= enum | struct | union opt nume tip' pt referire ulterioara (prefixat cu enum, struct, union) optJista declaratori: pot fi declarate obiecte de tipul respectiv - opt numectip e intr-un spatiu de nume diferit de cel comun pentru variabile, tipuri si functii E mai clar sa folosim totusi nume diferite Folosite pentru a da nume simbolice unui sir de valori numerice Sintaxa: enum optmumectip { lista-constante } optJista declaratori ; - constantele pot avea specificate valori (si o valoare se poate repeta) enum luni curs {ian=l, feb, mar, apr, mai, iun, oct=10, nov, dec}-; - implicit, sirul valorilor e crescator cu pasul 1, iar prima valoare e 0 - acelasi nume de constanta nu poate fi folosit in doua enumerari diferite - tipurile enumerare sunt tipuri intregi => variabilele enumerare se pot folosi la fel cu variabilele intregi - cod mai lizibil decat prin declararea separata de constante int ore lucru ; enum zile sapt {D, L, Ma, Mc, J, V, S} zi; for (zi = L; zi e echivalent cu indirectarea urmata de selectie: pointer->nume camp e echivalent CU (*pointer) nume camp Operatorii si -> au precedenta cea mai ridicata, ca si () si [] Atentie la ordinea de evaluare ! p->x++ inseamna (p->x)++ ++p->x inseamna ++(p->x) +p->x inseamna +(p->x) *p->s++ inseamna *((p->s)++) in C, tipurile agregat pot fi combinate arbitrar (tablouri de structuri, structuri cu campuri de tip tablou, etc ) Tipurile trebuie definite in asa fel incat sa grupeze logic datele Ex : daca doua tablouri au acelasi domeniu pt indici si datele de la acelasi indice sunt folosite impreuna, e preferabila gruparea in structura: char* nume luna = { "ianuarie", , "decembrie" }; char zile luna = { 31, 28, 31, 30, , 30, 31 };  * e preferabila varianta urmatoare *  typedef struct { char *nume; int zile; } tip luna; tip luna luni = { {"ianuarie", 31}, , {"decembrie", 31} }; Un camp al unei structuri nu poate fi o structura de acelasi tip (s-ar obtine o structura de dimensiune infinita nedefinita!) Poate fi insa adresa unei structuri de acelasi tip (un pointer)! => structuri de date recursive, inlantuite (liste, arbori, etc ) struct wl {  *o lista de cuvinte *  char *word;  * informatia propriu-zisa *  struct wl *next;  * pointer la acelasi tip de structura *  }; Un arbore binar, avand in noduri numere intregi: typedef struct t tree;  * declaratie incompleta *  struct t { int val; tree *left, *right;  * foloseste numele din typedef *  }; Programarea calculatoarelor Curs 13 Marius Minea Programarea calculatoarelor Curs 13 Marius Minea Programarea calculatoarelor Curs 13 Marius Minea Tipuri definite de utilizator Tipuri definite de utilizator Se pot declara campuri intregi cu un numar specificat de biti => Testa re a set a rea unor biti se face folosind direct numele campului fara a fi nevoie de definirea de masti si utilizarea unor operatori pe biti camp ::= tipJnt nume : inJconst ; | tipJnt : inJconst ; struct packet { int : 2;  * primii doi biti nu intereseaza *  int error: 1;  * un bit, semnalizeaza eroare *  int status: 3;  * un camp pe 3 biti *  int : 0;  * forteaza alinierea la octetul urmator *  int seq no: 4;  * numar de secventa pe 4 biti *  } pkt; if (pkt error) { } else if (pkt status == 5) { } else pkt seq no++; Agregate a caror valoare poate avea date de tipuri diferite, dupa caz Sintaxa: similara cu cea pentru structuri union opt nume tip { lista campuri } optjista declaratori ; Lista de campuri este insa o lista de variante: - o variabila structura contine toate campurile declarate - o variabila uniune contine exact una din variantele date (dimensiunea tipului e data de cel mai mare camp) - o variabila uniune nu contine informatii despre varianta reprezentata - acest lucru trebuie memorat explicit in program (in alta variabila) Exemplu: un analizor lexical (prima faza a compilatorului) returneaza: - un cod intreg pt fiecare atom lexical (cuvant cheie, operator, etc ) -date suplimentare pentru identificatori (nume) si constante (valoare) enum tok { iDENT, iNUM, FNUM, DO, iF, , PLUS, , СОММА, }; typedef union { char *id;  * sir de caractere pentru identificator *  int ival;  * valoare pentru constanta intreaga *  float fval;  * valoare pentru constanta reala *  } lexvalue; enum tok token; lexvalue iv; switch (token) { case iDENT: printf("%s", iv id); break; case iNUM: printf("%d", lv ival); break; case FNUM: printf("%f", iv fval); break; } Programarea calculatoarelor Curs 13 Marius Minea Programarea calculatoarelor Curs 13 Marius Minea Programarea calculatoarelor Curs 13 Marius Minea 30 mai Programarea calculatoarelor Curs 14 2006 Marius Minea Declaratii Compilare separata 2 Variabilele pot fi declarate si in afara functiilor Daca in declaratia de variabile nu apar alti specificatori inainte de tip: = o variabila declarata in afara oricarei functii - are spatiu de memorie alocat pe intreaga executie a programului - e initializata o singura data (cu valoarea data explicit in declaratie, sau implicit cu zero) - e vizibila in intreg textul programului incepand cu declaratia ei = o variabila declarata in interiorul unui bloc (inclusiv de functie) - exista doar atat timp cat programul executa blocul respectiv - sunt initializate cu valoarea data la orice intrare in blocul respectiv (sau au o valoare nedefinita daca declaratia nu specifica initializare) - sunt vizibile doar in interiorul blocului respectiv Programarea calculatoarelor Curs 14 Marius Minea Declaratii Compilare separata 3 Pt orice identificator, compilatorul trebuie sa-i decida semnificatia identificatorii obisnuiti: variabile, tipuri, functii, constante enumerare au un comun (NU: variabila si functie cu acelasi nume) Ql: Un identificator poate fi folosit intr-un punct de program ? R: (al unei declaratii   al unui identificator) - domeniu de vizibilitate la nivel de {file scope) pentru identificatori declarati in afara oricarui bloc (oricarei functii) din punctul de declaratie pana la sfarsitul fisierului compilat - domeniu de vizibilitate la nivel de {block scope) pentru identificatori declarati intr-un bloc { } (corp de functie, instructiune compusa) si pentru parametrii unei functii din punctul de declaratie pana la acolada } care inchide blocul Un identificator poate fi intr-un bloc interior si isi recapata vechea semnificatie cand blocul ia sfarsit Programarea calculatoarelor Curs 14 Marius Minea Declaratii Compilare separata 4 Q2: Doua declaratii ale unui identificator se refera la aceeasi entitate? R: Tipul de legatura ( ) al unui identificator (obiect functie) - : toate declaratiile identificatorului din toate fisierele care compun un program se refera la acelasi obiect sau functie pentru declaratiile la nivel de fisier fara specificator de memorare sau declaratia cu specificatorul extern a unui identificator care nu a fost deja declarat cu tipul de legatura intern - : toate declaratiile identificatorului din fisierul curent se refera la acelasi obiect sau functie; nu se propaga in exteriorul fisierului pt declaratiile la nivel de fisier cu specificatorul de memorare static - fara legaturi ( ): fiecare declaratie denota o entitate unica pentru declaratiile la nivel de bloc fara specificatorul extern Programarea calculatoarelor Curs 14 Marius Minea Declaratii Compilare separata 5 Q3: Ce timp de viata durata de memorare are un obiect in program? R: 3 feluri diferite: static, automatic si alocat (discutat ulterior) Pe intreaga durata de viata, un obiect are o adresa constanta si isi pastreaza ultima valoare memorata Durata de memorare : pentru obiecte declarate cu tipul de legatura extern sau intern, sau declarate cu specificatorul de memorare static - timp de viata: intreaga executie a programului - obiectul e , inainte de lansarea in executie Durata de memorare : pentru obiecte fara legatura - timp de viata: de la intrarea in blocul asociat pana la incheierea sa - la fiecare apel recursiv, se creaza o noua instanta a obiectului 7 - o eventuala initializare in declaratie e repetata de cate ori e atinsa Programarea calculatoarelor Curs 14 Marius Minea Declaratii Compilare separata 6 - fiecare fisier poate fi compilat separat in format obiect - apoi fisierele sunt legate (linkeditate) pentru a crea executabilul - orice identificator trebuie declarat in fiecare fisier sursa unde e folosit - o variabila va fi definita (evtl cu initializare) intr-un singur fisier, si declarata (evtl cu specificatorul extern) in celelalte (ex tipuri de date cu functiile lor) -intr-un fisier h se declara tipurile si functiile necesare -intr-un fisier c se definesc (implementeaza) functiile - programul care le foloseste va include fisierul h si va fi compilat cu fisierul c (linkeditat cu fisierul obiect rezultat din el) Programarea calculatoarelor Curs 14 Marius Minea Declaratii Compilare separata Declaratii Compilare separata Programarea calculatoarelor Curs 14 30 mai 2006 Marius Minea Variabilele pot fi declarate si in afara functiilor Daca in declaratia de variabile nu apar alti specificatori inainte de tip: = o variabila declarata in afara oricarei functii - are spatiu de memorie alocat pe intreaga executie a programului - e initializata o singura data (cu valoarea data explicit in declaratie, sau implicit cu zero) - e vizibila in intreg textul programului incepand cu declaratia ei = o variabila declarata in interiorul unui bloc (inclusiv de functie) - exista doar atat timp cat programul executa blocul respectiv - sunt initializate cu valoarea data la orice intrare in blocul respectiv (sau au o valoare nedefinita daca declaratia nu specifica initializare) - sunt vizibile doar in interiorul blocului respectiv Programarea calculatoarelor Curs 14 Marius Minea Pt orice identificator, compilatorul trebuie sa-i decida semnificatia identificatorii obisnuiti' variabile, tipuri, functii, constante enumerare au un comun (NU: variabila si functie cu acelasi nume) Ql: Un identificator poate fi folosit intr-un punct de program ? R: (al unei declaratii   al unui identificator) - domeniu de vizibilitate la nivel de (file scope) pentru identificatori declarati in afara oricarui bloc (oricarei functii) din punctul de declaratie pana la sfarsitul fisierului compilat - domeniu de vizibilitate la nivel de (block scope) pentru identificatori declarati intr-un bloc { }- (corp de functie, instructiune compusa) si pentru parametrii unei functii din punctul de declaratie pana la acolada }- care inchide blocul Un identificator poate fi intr-un bloc interior si isi recapata vechea semnificatie cand blocul ia sfarsit Programarea calculatoarelor Curs 14 Marius Minea Declaratii Compilare separata 4 Declaratii Compilare separata Declaratii Compilare separata Q2: Doua declaratii ale unui identificator se refera la aceeasi entitate? R: Tipul de legatura ( ) al unui identificator (obiect functie) - : toate declaratiile identificatorului din toate fisierele care compun un program se refera la acelasi obiect sau functie pentru declaratiile la nivel de fisier fara specificator de memorare sau declaratia cu specificatorul extern a unui identificator care nu a fost deja declarat cu tipul de legatura intern - : toate declaratiile identificatorului din fisierul curent se refera la acelasi obiect sau functie; nu se propaga in exteriorul fisierului pt declaratiile la nivel de fisier cu specificatorul de memorare static - fara legaturi ( ): fiecare declaratie denota o entitate unica pentru declaratiile la nivel de bloc fara specificatorul extern Q3: Ce timp de viata durata de memorare are un obiect in program? R: 3 feluri diferite: static, automatic si alocat (discutat ulterior) Pe intreaga durata de viata, un obiect are o adresa constanta si isi pastreaza ultima valoare memorata Durata de memorare : pentru obiecte declarate cu tipul de legatura extern sau intern, sau declarate cu specificatorul de memorare static -timp de viata: intreaga executie a programului - obiectul e , inainte de lansarea in executie Durata de memorare : pentru obiecte fara legatura - timp de viata: de la intrarea in blocul asociat pana la incheierea sa - la fiecare apel recursiv, se creaza o noua instanta a obiectului - o eventuala initializare in declaratie e repetata de cate ori e atinsa - fiecare fisier poate fi compilat separat in format obiect - apoi fisierele sunt legate (linkeditate) pentru a crea executabilul - orice identificator trebuie declarat in fiecare fisier sursa unde e folosit - o variabila va fi definita (evtl cu initializare) intr-un singur fisier, si declarata (evtl cu specificatorul extern) in celelalte (ex tipuri de date cu functiile lor) -intr-un fisier h se declara tipurile si functiile necesare -intr-un fisier c se definesc (implementeaza) functiile - programul care le foloseste va include fisierul h si va fi compilat cu fisierul c (linkeditat cu fisierul obiect rezultat din el) Programarea calculatoarelor Curs 14 Marius Minea Programarea calculatoarelor Curs 14 Marius Minea Programarea calculatoarelor Curs 14 Marius Minea URL: http:  www elsevier nl locate entcs volume23 html 13 pages Model Checking Semi-Coiitinuous Time Models Using BDDs Sergio Campos,a Marcio Teixeira,a Marius Minea,b Andreas Kuehlmann,c Edmund Clarkeb a Univ Federal de Minas Gerais, Dept de Ciencia da Computagao, Brasil {scampos,mto}@dcc ufmg br b Carnegie Mellon University, School of Computer Science, USA {marius, emc}@cs emu edu c iBM T J Watson Research Center, USA kuehl@watson ibm corn Abstract The verification of timed systems is extremely important, but also extremely diffi-cult Several methods have been proposed to assist in this task, including extensions to symbolic model checking One possible use of model checking to analyze timed systems is by modeling passage of time as the number of taken transitions and ap-plying quantitative algorithms to determine the timing parameters of the system The advantage of this method is its simplicity and efficiency in this paper we extend this technique in two ways First, we present new quantitative algorithms that are more efficient than their predecessors The new algorithms determine the number of occurrences of events in all paths between a set of starting States and a set of final States We then use these algorithms to introduce a new model of time, in which the passage of time is dissociated from the occurrence of events With this new model it is possible to verify systems that were previously thought to require dense time models We use the new method to verify two such examples previously analyzed by the HyTech tool: a steam boiler example and a fuel injection controller 1 introduction Computers are frequently used in applications where failures can have severe consequences, such as in the control of industrial machinery or transportation equipment in these applications, the computer system must not only produce the correct result, but must do so in timely fashion For example, a command to apply the brakes of a car or to turn an airplane to a certain direction cannot be late, otherwise an accident may occur Such failures cannot be tolerated, making the correciness of these systems an extremely important issue @1999 Published by Elsevier Science В V 75 V-O UjV" However, verification of such systems is a very complex problem, made even harder by timing requirements Several methods have been proposed to accomplish this task One method that has obtained significant success is model checking in this technique the system being verified is modeled as a state-transition graph and properties of the system are expressed as temporal logic formulas The verification procedure consists of a search on the state space of the graph to determine which states satisfy the properties Original model checkers were not designed to verify timing characteristics Several extensions have been proposed to express and verify such properties The first and simplest is to associate each transition with the passage of one time unit and to determine elapsed time by counting the number of transitions between events This technique assumes a discrete time model The main ad-vantage is its simplicity and extremely efficiently implementation, particularly in BDD-based symbolic model checkers such as SMV or Verus Another approach is to use a continuous time model, in which events can happen at any moment in the dense time domain, e g , timed automata Since in this case the state space is inherently infinite, model checking entails constructing a finite equivalent model, the complexity of which can be quite high These models, as well as the verification algorithms are considerably more complex than in the discrete time case initial tools were unable to handle models with more than hundreds or thousands of States Current tools are significantly more efficient , but verifying timed automata is still much more expensive than the verification of discrete time models However, discrete time models have one major disadvantage over continuous time models: their limitation in expressing the semantics of event se-quences that happen in short periods of time For example if the occurrence of an event a triggers an alarm b and an immediate response c we can model these events as happening simultaneously or taking at least two time units to occur This may not correspond to reality, however it may be the case that after event a has occurred but before alarm b another event d occurs that would change response c But if a, b and c happen at the same time this possibility would not be present On the other hand, if it takes 1 time unit between a and b it would not be possible for d to occur between a and b For this reason discrete time models cannot be used in some applications where accuracy is essential The proposed method overcomes this problem by using zero-length transitions to model the occurrences of events without time passing The passage of time then occurs in discrete steps using unit-length transitions The advan-tage of this new model is that it removes the limitation on event orderings for the discrete time model For example, it is now possible to let events a, b, c and d described previously occur in time zero preserving their order, and only let time elapse after all events have occurred We argue that this enables the verification of many systems that have been previously thought to require dense time models 76 V-O UjV" in order to determine the time between events in the semi-continuous time model we use quantitative timing analysis as described in Of particular interest are the condition counting algorithms that count the minimum and maximum number of occurrences of a specific event in a given set of in-tervals in this work these algorithms are used to count the minimum and maximum number of unit transitions on paths of interest, computing the time elapsed between events We propose new condition counting algorithms that are significantly more efiicient than the previous ones These algorithms allow verification to be done as efficiently as for the simple discrete time case They are similar to the fixpoint computation used in model checking for untimed systems, and as such can be implemented efficiently using BDDs To demonstrate the expressive power and efficiency of the method we have verified two examples of systems in which high accuracy is necessary to achieve the correct results The first is the steam boiler example described in This example, while small, demonstrates that the proposed model can be used to verify systems which are not usually considered in the realms of discrete time We have then verified an automotive engine controller developed for Magneti-Marelli that has been previously verified by HyTech We have modeled the controller that identifies that the driver has released the accelerator and regulates the reduction of fuel injection This identification is a complex time criticai function of the position of several sensors if the timing of the events that take place during its execution is wrong, the algorithm may not converge and the controller can malfunction We have verified both examples using Verus, demonstrating the effectiveness of the proposed method 2 Related Work A precursor to the presented analysis method has been developed in the real-time model checker Verus This tool implements quantitative timing analysis algorithms that determine the timing characteristics of a system by counting the time between events or the number of occurrences of events in given intervals The method has been used to verify large and complex timed Systems such as an aircraft controller , a robotics controller and the PCi local bus However, the condition counting algorithms used in that context require the augmentation of the state space with a additional integer time vari-able which added a significant overhead to verification The new algorithms do not require this construct and are efficiently implemented using BDDs The occurrence of events without the passage of time has been discussed in But that work does not consider a symbolic implementation using BDDs and is not as efiicient it also does not use quantitative analysis algorithms and cannot generate the same type of information as the method proposed A significant body of research exists on continuous-time models One of the most widely used models are the timed automata , which add real-valued clock variables to represent time Clocks evolve at the same rate, modeling 77 V-O UjV" passage of time, and formulas can refer to the value of the clocks to express timing properties Verification is then performed on a finite-state quotient model, such as the region graph or the zone automaton However, the expressive accuracy comes with a significant increase in complexity, and a significant effort in the development of continuous-time model checkers has been devoted to dealing with the state explosion problem The expressiveness and efficiency trade-offs between discrete and continuous time raise the question when a discrete-time approximation is suficient to model all continuous-time behaviors of a system This problem is analyzed, e g , in This work introduces the notion of digitizability and proves that such a reduction is possible for timed transition systems, for verification of properties such as time-bounded invariance and time-bounded response More recent work shows that a reduction to discrete time can be performed for acyclic combinational circuits, but not for all cyclic ones These can only be reduced under the constraint that no strict inequality is used in their design 3 Condition Counting Algorithms Our method relies on the ability to count some transitions on a path but not necessarily all of them in order to accomplish this, we use the algorithms described in this section The original algorithms used in our method to verify real-time systems determine the length of a path leading from a set of starting states to a set of final States But to verify semi-continuous time models we also need to compute the minimum and maximum number of times a given condition holds on any path from start to fi,nai in we have presented algorithms that compute this information However, these algorithms required an augmentation of the state space with a counter to store intermediate results This made the algorithms very expensive in some cases The algorithms described in this section do not suffer from this limitation We require that every state of the model has at least one outgoing transition We also assume that any path beginning in start reaches a state in fi,nai in a finite number of steps This is necessary so that the minimum and maximum are well-defined it can be checked using the maximum algorithm described in We also consider only reachable states, which can be achieved by intersecting start with the set of reachable states computed a priori Minimum Condition Counting The minimum condition count algorithm computes the minimum number of states satisfying a given condition cond over all paths that start in a state in start and end in a state in fi,nai Any paths starting in start, but which do not reach fi,nai in a finite number of steps are excluded from this computation in particular, if no path from start ever reaches fi,nai, the algorithm will return the special value NOPATH 78 The algorithm looks for paths beginning in start that have an increasing number of occurrences of cond Each iteration consists of two phases: The first is a forward traversai through States that do not satisfy cond This traversai is performed until all States (not satisfying cond) reachable from the current frontier are found if final has not been reached yet, the frontier is expanded by one step to States that satisfy cond and the condition counter is incremented The algorithm iterates until fi,nai is found, or all reachable States are visited The algorithm must differentiate between States that do not satisfy cond and those that do, and similarly, between transitions leading to these States We use subscripts 0 and 1 respectively for the two types of States and transitions For example, start  is the set of initial States that do not satisfy cond, and starti is the set of initial States that satisfy cond: start  = start П ^cond starti = start П cond Furthermore, if Nfs, s') is the transition relation, we denote by T (S) and TfiS) the set of transitions from a state in S that lead to States not satisfying cond and to States satisfying cond, respectively: T (S) = {s' | 3s G S N(s, s') A sf 0 cond} TfiS) = {s' | 3s G S N(s, s') A s' G cond} The argument about the correciness of the algorithm follows from invariants stating that B! at the ith iteration contains the set of all States that can be reached as endpoints of finite intervals starting in start, have no state in fi,nai (except perhaps the last one), and having i or less States satisfying condition The proof can be found in the full version of the paper Maximum Condition Counting The maximum condition count algorithm computes the maximum number of States satisfying a given condition cond over all paths that begin in a state in start and end in a state in fi,nai without previously traversing a state in fi,nai if there is a path beginning in start that goes through cond infinitely often without reaching final, the algorithm returns infinity The basic idea behind the algorithm is to hnd paths with increasing condition count whose States are all within -final The condition count of the longest path satisfying this condition and starting in start is the desired maximum Similarly to the mincount algorithm, we consider transitions into States that satisfy cond and that do not satisfy cond separately This algorithm, however, performs a backward search, and uses the reverse image of the transition relation in this case B (S') is the set of States satisfying neither cond nor fi,nai that lead to a state in S' in one step Similarly, BfiS') is the set of States satisfying cond but not final that lead to a state in S' in one step Note that fi,nai only appears implicitly in the algorithm, in the definitions of B  and Bi B (Sr) = {s | 3s' G S' N(s, s') A s 0 final A s 0 cond} BfiS') = {s | 3s' G S',N(s, s') A s 0 final A s G cond} 79 proc mincount(start, cond, final) i = 0; R = 0; R' = start^, do do if (R! П final Ф 0) return г; R = R!-, R! = TQ(R!) U R'- while (Rf Ф R); R! = TV(R!) U Rf if (i = 0)Rr = R! U starts, i = i + 1; while (R! Ф R); return NOPATH; proc maxcount(start, cond, final) i = 0; R! = cond-, do   *, = Rr; do R= R': R' = R'UBq(R')- while (Rf Ф R); if (R' П start = 0) return i; R! = B^R!)- i = i + 1; while (R! Rf); return oo; Fig 1 Minimum and maximum condition count algorithms Again, we argue the correciness of the algorithm using an invariant similar to the previous one it States that at the ith iteration R' is the set of all States that are the start of a finite path which has no States in fi,nai (except possibly the last one), and which has i + 1 States that belong to cond The proof can be found in the full version of the paper 4 Semi-Continuous Time The basic idea of the proposed method is to allow zero-length transitions that model the occurrence of events without time passing, thus making the occurrence of events independent of the passage of time To allow zero-length transitions we have created a special variable t in the model of the system being verified Time passage is controlled by enabling unit-length transitions only when t is true, and enabling zero-length transitions only when t is false Parallel composition of processes under the new model is defined as fol-lows Unit transitions have to occur synchronously, that is, all processes must 80 V-O UjV" execute a unit transition in order for time to elapse Zero-length transitions, on the other hand occur asynchronously When a process performs a zero-length transition all other processes are not executing As a consequence of this, zero-length transitions are always enabled Unit transitions however, are only enabled when there is at least one unit transition enabled in each process This parallel composition model satishes one important invariant: passage of time is identical in all processes A symbolic implementation of this parallel composition model is straight-forward given the traditional parallel composition algorithms used in BDD-based tools: conjunction of transition relations for synchronous composition and disjunction for asynchronous composition Under the new model we must first differentiate between unit and zero-length transitions Given TRa we define TRQa (7LRla) as the transition relation for zero-length (unit) transitions in Pa We can then define the global transition relation for a model with processes Pa and Pb as: TP = (TRla A TRlb) V (TROa V TROb) From this expression we can see that whenever unit transitions are enabled in all processes they are also enabled in the composed model The expression also guarantees that zero-length transitions enabled in some process are also always enabled in the composed model The only other condition that must be imposed in this model is that time eventually change This can be ensured by forbidding zero-length loops, which can be enforced by a syntactic check To determine how much time has elapsed between events, we use the condition counting algorithms For example, mincount[a, t = true,b] determines the minimum time between events a and b Similarly the maxcount algorithm can be used to determine the longest time between a and b 5 Expressive Power of the Proposed Method The proposed method does not have the same expressive power as a dense time model Our method uses a different "discretization" of dense time, but the final model is still discrete it has been proven that there exist systems which cannot be discretized without changing their behavior in it is shown that the following circuit has behaviors that cannot be captured by any discretization it has four signals xq, xi, x2 and X3, and transitions which assign values to them as: xi = ^xq,x2 = ^xq and X3 = ^xq Each transition takes time between 0 and 1 units to occur Let ti,t2 and t3 be the times when each transition occurs A possible behavior of the circuit could have transitions times satisfying 0 E[ ^t U (в2 A ">63 A -it A E[ ^t U 63])] where t is true in unit-length transitions, and false in zero-length ones Frequently, the fact that the total time elapsed is less than one time unit is not encoded in the formula in this case the formula can be simplihed to (ei A -ie2) EFi t  > Л   M2}  emergencystop Using Verus, we have been able to verify that the controller maintains the water level within the required bounds This result is the same obtained in The verification took 2 3 seconds and 1 1 MBytes of memory on a Pentium ii system We have also verified other properties of the steam boiler using the min-count and maxcount algorithms For example, an important parameter of the system is the size of Д, the frequency of communication between controller and plant Using Verus we have been able to determine that Д = 6 also sat-isfies the safety requirements, but Д = 7 does not if communication between 83 V-O UjV" controller and units is delayed by up to one second, safety is maintained, but longer delays can cause safety problems Several other parameters have been identihed, including, e g , the minimum and maximum times needed for water to go from the minimum to the maximum level The interval is sec-onds, meaning that the water may never reach the maximum water level from the minimum water level, but it never takes less then 20 seconds 6 2 Automotive Engine Controller in Cutoff Model in order to demonstrate the efficiency of the method we have verified an au-tomotive engine controller in cutoff mode described in and verified by HyTech We have studied the cutoff mode, where we consider control of the engine once the driver has released the accelerator pedal The system must then guarantee that the engine will deliver zero torque within a certain time The control objective is to reach injection cutoff while minimizing acceleration discomfort if fuel injection is abruptly cut off, the vehicle may exhibit very undesirable acceleration oscillations if fuel injection remains on for a long time the car does not decelerate in order to minimize these problems, the controller makes intelligent decisions about when and how to cut off fuel The system consists of the engine, which includes the driveline and the cylinders, and its controller The engine has four cylinders, each of which cycles in lockstep through four phases in the following order: intake (i), com-pression (C), expansion (E), and exhaust The controller must make its decision on injection (modeled by the binary output variable j) at the beginning of the preceding exhaust phase if fuel is injected into a cylinder, the cylinder produces torque on its next expansion phase Thus the driveline does not react to a control decision until three phases later The controller sets the value of j at each phase change, with the function F modeling the decision to inject fuel or not The function F is defined over a transformed state space (over the variables xi, x2, xs, x^) that helps isolate the fundamental modes related to acceleration oscillations Powertrain oscillations are due to the pair of complex conjugate poles, which are related to x2 and xs components Thus, our analysis concentrates on the x2 — x3 subspace, where the encirclements of the origin correspond to oscillations (more details about the system can be found in ) The automotive engine controller should meet the requirement that for a given initial condition the state is close to the origin (injection cutoff) within a bounded number of phases (convergence) To show the convergence requirement using the same parameters described in we have computed the maximum time from an initial state until a trajectory is close to the origin We have used Verus to verify the requirements The code for the example has been generated automatically from the HyTech original code using a perl script written for this purpose We have divided the x2 — x3 state space into 25 x 25 partitions increasing the accuracy of the rectangular approximations 84 V-O UjV" in our model, phase changes occur in unit time, and all other events happen in time zero We have determined that the maximum time until a trajectory is close to the origin is 29 steps, the same result obtained by HyTech Verification was performed very efficiently, but at the same time it has shown a limitation of our method The source file for this example is extremely large, it has more than 250,000 lines of Verus code! it is, to the authors’ knowledge, the largest example verified by symbolic model checking it took Verus several hours to compile this code into a transition graph representing the system Once the model was generated, however, verification was performed in only 18 seconds The reason for the long compilation time seems to be related to the fact that for systems which involve large constants, discretization can lead to a large state space representation even when using BDDs This is caused by the binary encoding of integer values used in some of these cases, continuous time models may be more efiicient, since the representation is less dependent on time granularity However, for models whose timing constants are well-behaved, a discrete-time model with a uniform BDD-based representation can present significant gains in efficiency in this case it seems that both effects were present The values represented for x2 and x3 are well behaved, but their values are large, as well as the number of operations that have to be performed on them, making the generation of the model slow, but possible Verification, on the other hand, was performed extremely fast, showing that the complexity is related to the manipulation of large integer values, not to the representation of time 7 Conclusions in this work we propose a new algorithm to perform quantitative timing analysis of models that is more efiicient than its predecessor This algorithm, called condition counting, counts the minimum and maximum number of occurrences of events between two events start and fi,nai The algorithm is used to imple-ment an alternative method to represent time which enables the verification of systems that were previously considered to require dense time models Verification under the new model can be performed as efficiently as for discrete time models The proposed method has been implemented in Verus, but it can be used in most BDD-based symbolic model checkers Two examples that had previously been verified by the dense-time tool HyTech have been modeled and verified in Verus Future work includes a more accurate characterization of the expressive power of the method Acknowledgments We would like to thank Howard Wong-Toi for the many useful discussions about the examples that have been verified in HyTech 85 Ol' UjV" References Rajeev Alur, Costas Courcoubetis, and David Dill Model-checking for real-time systems in Proc 5th Annual iEEE Symp on Logic in Computer Science, pages 414-425, Philadelphia, PA, USA, June 1990 iEEE Press Eugene Asarin, Oded Maler, and Amir Pnueli On discretization of delays in timed automata and digital circuits in D Sangiorgi and R de Simone, editors, CONCUR’98: Concurrency Theory 8th int Conf Proc , volume 1466 oiLNCS, pages 470-484, Nice, France, September 1998 Springer S Campos, E Clarke, W Marrero, and Vl Minea Verifying the performance of the PCi local bus using symbolic techniques in Proc iEEE int Conf on Comput Design, pages 72-78, Austin, TX, USA, October 1995 iEEE Press S V Campos A Quantitative Approach to the Formal Verification of Real-Time Systems PhD thesis, School of Computer Science, Carnegie MellonUniv , 1996 S V Campos, E Vl Clarke, W Marrero, and Vl Minea Timing analysis of industrial real-time systems in Proc Workshop on industrial-strength Formal Specification Techniques, pages 97-107, Boca Raton, FL, April 1995 iEEE Press S V Campos, E Vl Clarke, W Marrero, Vl Minea, and H Hiraishi Computing quantitative characteristics of finite-state real-time systems in Proc 15t l iEEE Real-Time Systems Symp , pages 266-270, San Juan, Puerto Rico, December 1994 iEEE Press E Vl Clarke and E A Emerson Design and synthesis of synchronization skeletons using branching time temporal logic in Logic of Programs: Workshop, volume 131 of LNCS, pages 52-71, Yorktown Heights, NY, USA, 1981 Springer E Vl Clarke, E A Emerson, and A P Sistla Automatic verification of finite-state concurrent systems using temporal logic specifications ACM Transactions on Programming Languages and Systems, 8(2) :244—263, 1986 H De-Leon and O Grumberg Modular abstractions for verifying real-time distributed systems Formal Methods in System Design, 2:7-43, 1993 David L Dill Timing assumptions and verification of finite-state concurrent systems in J Sifakis, editor, Proceedings of the international Workshop on Automatic Verification Methods for Finite State Systems, volume 407 of LNCS, pages 197-212, Grenoble, France, June 1989 Springer T A Henzinger, P H Ho, and H Wong-Toi HyTech: the next generation in Proc idth iEEE Real-Time Systems Symp , pages 56-65, Pisa, italy, December 1995 iEEE Press Thomas A Henzinger, Zohar Manna, and Amir Pnueli What good are digital clocks ? in W Kuich, editor, Automata, Languages and Programming 19ttl 86 international Colloquium Proceedings, volume 623 of LNCS pages 545-558, Wien, Austria, July 1992 Springer Thomas A Henzinger, Xavier Nicollin, Joseph Sifakis, and Sergio Yovine Symbolic model checking for real-time systems in Proc 7th Annual iEEE Symp on Logic in Computer Science, pages 394-406, Santa Cruz, CA, USA, June 1992 iEEE Press Thomas A Henzinger and Howard Wong-Toi Using HyTech to synthesize control parameters for a steam boiler in Formal Methods for industrial Applications: Specifying and Programming the Steam Boiler Control, volume 1165 of LNCS, pages 265-282 Springer, 1996 K G Larsen, P Pettersson, and W Yi Compositional and symbolic model-checking of real-time systems in Proc idth iEEE Real-Time Systems Symp , pages 76-87, Pisa, italy, December 1995 iEEE Press K L McMillan Symbolic Model Checking Kluwer Academic Publishers, 1993 Tiziano Villa, Howard Wong-Toi, Andrea Balluchi, Joerg Preussig, Alberto Sangiovanni-Vincentelli, and Yosinori Watanabe Formal verification of an automotive engine controller in cutoff mode in CDC98: iEEE Conference on Decision and Control, Tampa, Florida, December 1998 S Yovine Kronos: A verification tool for real-time systems Springer international Journal of Software Tools for Technology Transfer, 1, October 1997 87 int J STTT (1999) 2: 279-287 © 1999 Springer-Verlag State space reduction using partial order techniques E M Clarke1’* *, O Grumberg2’**, M Minea 1, D Peled3 1 Carnegie Mellon University, School of Computer Science, Pittsburgh, PA 15213-3891, USA 2 Department of Computer Science, The Technion, Haifa 32000, israel 3Bell Laboratories, 600 Mountain Ave , Murray Hill, NJ 07974-2070, USA Abstract With the advancement of computer technol-ogy, highly concurrent systems are being developed The verihcation of such systems is a challenging task, as their state space grows exponentially with the number of processes Partial order reduction is an effective technique to address this problem it relies on the observation that the effect of executing transitions concurrently is often independent of their ordering in this paper we present the basic principles behind partial order reduction and its im-plementation Key words: State space reduction - Partial order reduction 1 introduction One of the main problems in automatic verification of systems is the so-called state space explosion problem For many types of systems, the number of possible states dur-ing system execution grows exponentially with the size of the system and the number of its component parts This quickly leads to models whose size exceeds the current ca-pabilities of verification tools Partial order reduction is a technique that addresses this problem for concurrent asynchronous systems by constructing a smaller state space that is searched by Correspondence to: Doron Peled * This research is sponsored by the the Semiconductor Research Corporation (SRC) under Contract No 97-DJ-294, the National Science Foundation (NSF) under Grant No CCR-9505472, and the Defense Advanced Research Projects Agency (DARPA) under Contract No DABT63-96-C-0071 Any opinions, findings and con-clusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of SRC, NSF, DARPA, or the United States Government ** This research was partially supported by the fund for the pro-motion of research at the Technion the verification (model checking) algorithms in general, asynchronous systems are described using an interleaving model of computation Concurrent events are modeled by allowing their execution in all possible orders relative to each other, creating a large number of possible states and paths However, specifications typically do not distin-guish between all different orders Partial order reduction considers only a restricted set of behaviors of the system, while guaranteeing that the ignored behaviors do not add any new information in this survey we will describe a method of partial order reduction The main goal of this paper is to pro-vide an intuitive description of the main ideas and present some techniques that can be used for implementation Reducing the state space by using commutativity be-tween concurrent transitions was suggested by several re-searchers in his Ph D thesis, Overman suggested a method to avoid exploring all the states of a concurrent system However, this method was only applied to sys-tems without loops Katz and Peled suggested a proof system for concurrent systems that takes the commuta-tivity between transitions into account The core of the deduction system was based on using proof rules that asserted properties of sequences which are generated by taking certain subsets of successors from each state in the last decade, several researchers have developed methods to apply reduction principles in model checking These techniques include the stubborn sets method of Val-mari , the persistent sets method of Godefroid and Wolper , and the ample sets method of Peled These works contain similar ideas, although they differ with respect to the details of the suggested reduction We will present here the ample sets method The name partial order reduction reflects a connec-tion between the initial versions of these reductions and partial order semantics Roughly, a partially ordered ex-ecution is represented by a set of events and a causality 280 E M Clarke et al : State space reduction using partial order techniques relation between them The causality relation indicates that some events must precede others, while events that are not constrained by this relation are independent and can happen in any order in contrast, in a total ordering on events, any given event must either precede or follow any other event Some versions of partial order reduction guarantee that the reduced state space includes for each such partially ordered execution at least one linearization (completion into a total order) However, most current methods do not maintain this relation any more 2 Fundamental notions The systems that we analyze are modeled as state transition graphs if S' is the set of States, a transition is a relation a- C S' x S, i e , it can be taken between different pairs of States A state transition graph is then defined as a tuple M = (S, S'o, T, L), where S'o C S is a set of initial States, T is a set of transitions a C S x S, and L : S —> 2AP is a labeling function that assigns to each state a subset of some set AP of atomic propositions A transition a- G T is enabled in a state s if there exists a state s' such that (s, s') G a- (or in other words a-(s, s') holds) if for any state s there is at most one state such that a-(s, s'), we caii a- a deterministic transition in this case we can view a- as a partial function on States instead of a relation and write s' = a(s) instead of a(s,s') The following presentation considere only deterministic transitions, without further explicit mention We reason about execution sequences of the system, called paths A path in a state-transition graph M is a huite or infinite sequence a = Sq -4 Si -i such that Si+i = cq(si) for every г in asynchronous systems, the number of transitions occurring between two events has no direct relationship to the time delay between them Furthermore, transitions which are concurrent in the system appear serialized in some order in the interleaving model These observa-tions argue for a specification which cannot distinguish between sequences of identically labeled States on an execution path of the system We caii two infinite paths stuttering equivalent (Fig 1) if they have identical state labelings after, in each of them, any huite sequence of identically labeled States is collapsed to a single state in other words, two infinite paths 0, L(sifc) = L(sifc+1) = = L(sifc+1 !) = L(rJfc) = L(rjk+1) = = L(rjk+1^1) The indices ik and jk are the starting points of identically labeled subse-quences of States in the two paths, respectively The stuttering equivalence relation between a and p is de-noted by , Q(g and рІАф An interpreta,tion of an LTL formula is an infinite word f = Xoip       over the alphabet 2AP, i e , a mapping from the naturals to 2AP We write A for the sufhx of f starting at г,; The semantics of LTL is as follows: - f |= p iff p G Xq, for p G AP, - 0 such that |= іф and sj |= p for all 0 false We also use the following abbre-viations: p V if = -,((-,(g) A ("A’)t ^>p = truelAp, t2p = Given a state transition graph M and an LTL formula p, the model checking problem for M and p is to verify that for every initial state Sq G S'o and every path f starting in Sq, it is true that f |= p if this holds, we write M |= p An LTL formula p is invariant under stuttering if for any two paths , • enabled in some local state Sj, that changes the value of the labeling function: Oj(sfi = s', L(sj) = 0, L(s') = {p}, for somep 2 AP The concurrent transitions Oj can be ordered in n! possible ways, producing a total of 2n different states Yet it is possible that the specification only needs to establish a property that links the initial global state (s1, , sn) with the resulting state (si, ,s'n), irrespective of the path taken between these in this case, it is much more efficient to consider only one particular ordering and the corresponding n +1 states Typically, the reduced model is constructed by per-forming a modified depth-first search on an explicit state representation of the system Model checking is done in a separate phase, on the resulting reduced state transition graph it is also possible to construct the reduced model on thefly, while performing model checking Other varia-tions are to use breadth-first search instead of depth-first search, or to combine partial order reduction with symbolic model checking A common point for all variants is that the reduced state space is constructed directly, without ever building the full state graph This would be counter to the purpose of reduction, since it is likely that the full state graph is too large to be constructed in the first place Consider, for the purpose of illustration, the case of depth-first search A typical search that constructs the entire reachable state space would follow all transitions enabled at the current state in the search With partial order reduction, only a subset of the enabled transitions is expanded at each state s We will call this set ample (s) To apply this method, we need a procedure to compute a suitable set ample(s) for every state s First, in order to obtain a much smaller state graph, ample(s) has to be significantly smaller than enabled(s) On the other hand, to ensure the correciness of the reduction, ample(s) has to include enough transitions such that for each behavior in the full state graph there is an equivalent behavior in the reduced state graph Finally, computing an ample set should be done with a reasonably small over-head so that verification time is not increased compared to full state space search Since the key issue in partial order reduction is to select only a restricted number of orderings between transitions for analysis, the concept of transitions that can be reordered has to be formalized This can be done by defining the key concept of independence relation between transitions Two transitions a, fi 2 T are independent if they satisfy the following two conditions for each state s 2 S: Enabledness: if a, fi 2 enabled(s) then a 2 enabled(fi(s)) and fi 2 enabled(a(s)) Commutativity: if a, fi 2 enabled(s) then a(fi(s)) = fi(a(s)) The enabledness condition expresses the fact that two independent transitions that are enabled at a given state cannot disable each other Note that the definition given here allows independent transitions to enable one another The commutativity condition states that the exe-cution of two independent transitions in any order (which is guaranteed to be possible by the enabledness condition) leads to the same state Two transitions are called dependent if they are not independent Consider the simple fragment of a state transition graph depicted in Fig 2 if transitions a and fi are independent, a possible reduction would be consider only the execution sequence s -! s1 ! s' and not the path s ! s2 -! s' However, this reduction may not be neces-sarily correct, either because the checked property can distinguish between the intermediate states s1 and s2, or because eliminating one of these states may cause some of its successors (which are significant for verification) not to be explored Additional conditions for the correciness of the reduction are needed, and they will be described in the following To address the first of these two issues, we define what it means for a specification to distinguish between two states, by introducing a second key concept, of invisi-ble transitions Recall that L : S ! 2AP is the labeling function that assigns to each state a set of atomic propo- Fig 2 independent transitions 282 E M Clarke et al : State space reduction using partial order techniques sitions The specification does not necessarily refer to the entire set of atomic propositions; let AP' C AP be the subset of atomic propositions referenced in the specification We caii a transition a- invisible with respect to some subset AP' C AP if its execution between any two States does not change the labeling with atomic propositions from AP' Formally, the transition a- G T is invisible with respect to AP’ if for any two States s,s' G S' such that s' = afs) we have L(s) П AP' = L(s')C AP' A transition is visible if it is not invisible if the subset of atomic propositions AP' is clear from the context (it is usually the set of atomic propositions contained in the specification), we will simply say that a transition is visible or invisible without explicitly mentioning that this is with respect to AP' 4 Partial order reduction for LTL   We have seen in the previous section that the properties of independence and invisibility for transitions and stuttering invariance for LTL v formulas allow us to verify the specification for the given system on a reduced model, and thus avoid the generation of all States The reduced model is constructed by selecting at each step a subset ample(s) of the transitions which are enabled at the cur-rent state s We say that a node s is fully expanded, if crmpZe(s) = enabledfs) We need a procedure that will determine a suitable set of ample transitions at each state Rather than directly give an algorithm that solves this problem, in this section we will characterize the set ample(s) using a set of con-ditions The next section continues by describing various heuristics that can be used to hnd ample sets that satisfy these conditions The first condition is trivial and guarantees that the search algorithm with reduction will make progress if the normal search algorithm would: СО Emptiness ample(s) = 0 iff enabled(s) = 0 The next constraint is introduced to ensure that any path that is not included in the reduced state-transition graph can be transformed, based on the properties of independent transitions, into a path in the reduced model, and therefore the reduction does not omit any paths which are essential for verification Ci Ample decomposition in the full state graph, on any path starting from some state s, a, transition dependent on a, transition from, ample(s) cannot appear before some transition from, ample(s) is executed To analyze the implications of Ci, consider an arbi-trary sequence of transitions a = q-q,Qi       that can be taken from some state Sq in the full state transition graph We outline the basic ideas of a construction that can be used to generate a path in the reduced model that contains all transitions from a (Fig 3) More details of the construction and a proof for its correciness are given by Clarke et al Fig 3 Reordering of transitions based on commutativity (a) if Q'o G amplefso), then q-q can be taken from Sq in the reduced model, and the path prefix Sq -4 Si belongs to the reduced model The construction is continued inductively from Si (b) if Q'o amplefso), consider first the case where the transition sequence a contains some transition from amplefso) Let  3 be the first such transition appear-ing in 1 Then by condition Ci all transitions cp, with 0 (burse pentru cei mai buni   cercetare) Doctorand: , cercetare mester mare (de obicei) Austria Elvetia Franta Germania 720 €  an ca 1300 CHF   an 130 - 700 €  an ca 1000 €  an (uneori, gratis) Olanda Scandinavia 1620 €  an 0 (zero) http:  ec europa eu educat ion study-in-europe  De la universitatea proprie Din tara de destinatie De la guvernul Romaniei Din surse private De la universitatea gazda European Region Action Scheme for the Mobility of University Students program al Comisiei Europene simplu, optiuni destul de variate maxim 1 an, revenire si diploma la UPT Contact: Departamentul de Programe si Relatii internationale si Situl Dept Relatii intenationale UPT are lista de burse   termene Bursele Guvernului Romaniei Agence Universitaire de la Francophonie Egide (burse Eiffel; toamna) Bursele guvernului francez (martie) : DAAD (octombrie) Bavaria: BAYHOST (februarie) alte burse private (Bosch, Mummert, etc ) roburse ro campusfrance org auf org www egide asso fr ambafrance-ro org daad de Alte tari Multe din burse necesita intai Dosarul de bursa Dosarul de angajare ambele sunt Unul singur poate fi cel mai bun din an => doar 1% pot fi intre primii 1% (evident)! Dar: poti fi printre cei mai buni, in specialitatea ta! => Gasiti specializarea potrivita Foaia matricola: doar nota si pozitia in an dar un om e mai mult decat un numar! Recomandarea a unui profesor nu doar: a fost bun la cursul meu ci: => Lucrul la un proiect e cea mai buna recomandare Exista si programe de limba engleza in Scandinavia, Olanda, Elvetia, Germania, Franta, italia Dar multe universitati foarte bune au programe in limba lor! Daca stiti limba, nu va limitati! De regula putine ( (burse pentru cei mai buni   cercetare) Doctorand: , cercetare mester mare (de obicei) Austria Elvetia Franta Germania 0 - 720 €  an 1100 - 1600 €  an 200 - 700 €  an ca 1000 €  an (uneori, gratis) Olanda Scandinavia 2000 €  an 0 (zero) De la universitatea proprie Din tara de destinatie De la guvernul Romaniei Din surse private De la universitatea gazda European Region Action Scheme for the Mobility of University Students program al Comisiei Europene simplu, optiuni destul de variate maxim 1 an, revenire si diploma la UPT Contact: Departamentul de Programe si Relatii internationale si Situl UPT are lista de acorduri si lista cu diverse burse Bursele Guvernului Romaniei Egide (burse Eiffel; dec ian ) Bursele guvernului francez (martie) Agence Universitaire de la Francophonie Acordul UPT de dubla diploma cu : DAAD (15 nov ) Bavaria: BAYHOST (1 dec ) alte burse private (Bosch, Mummert, etc ) Alte tari Multe din burse necesita intai Dosarul de bursa Dosarul de angajare ambele sunt Unul singur poate fi cel mai bun din an => doar 1% pot fi intre primii 1% (evident)! Dar: poti fi printre cei mai buni, in specialitatea ta! => Gasiti specializarea potrivita Foaia matricola: doar nota si pozitia in an dar un om e mai mult decat un numar! Recomandarea a unui profesor nu doar: a fost bun la cursul meu ci: => Lucrul la un proiect e cea mai buna recomandare Exista si programe de limba engleza in Scandinavia, Olanda, Elvetia, Germania, Franta, italia Dar multe universitati foarte bune au programe in limba lor! Daca stiti limba, nu va limitati! De regula putine ( 2, •••, = mk,n we analyze the collected runtime information and add new elements to the waitingQueue using the following two steps a) Try Forcing a Use: As we have mentioned, each intra-class def-use pair is represented as a tuple of the form (mdef, muse, F,C) Clearly, this information can be computed statically for all classes of the tested program Assume now that the test sk has executed successfully As explained in Section iii-B2, for each object created during a test execution, we know what method performed the last definition of each field of that object Consequently, for an object of class C referred by a reference vk;l from sk we know the method mdef that performed the last definitions of the field F Thus, for the object vk;l of class C we can try to force an invocation to any method muse of C containing a use of the field F in other words, we try to invoke a method muse that might access the last value set for F and thus cover the intra-class def-use pair (mdef, muse,F, C) As a result of this step, several tuples having the form (sk,vk;l, (mdef ,muse,F,C), forceUse) will be added to the waitingQueue Continuing with the example based on the class from Listing 1, as explained at the end of the previous section, by executing the single test available until now (see Listing 3), the last definition of field x on the instance var0 of SomeClass has been performed by the class constructor it is easy to see that all def-use pairs in which the first method contains a definition of field x and the second method contains a use of that field are: (SomeClass,a,x, SomeClass), (SomeClass,b,x, SomeClass), (b,a,x, SomeClass), and (b, b, x, SomeClass) Thus, in order to cover new intra-class def-use pairs, we should try to generate a test in which we invoke method a on var0 and another one in which we should try to invoke method b on the same reference Consequently, two tuples are added to the waitingQueue: (testl, var0, (SomeClass ,a,x, SomeClass), forceUse) and (testl, var0, (SomeClass, b, x, SomeClass), forceUse) The execution of this selection step is controlled by several constraints described in the following: 1) a tuple is added to the waitingQueue if and only if the corresponding (mdef, muse ,F, C) is not yet covered; the rationale is that if an intra-class def-use pair has been covered we should not try to cover it again; 2) if the executed sequence sk has been obtained by our guiding procedure, a new tuple is added to the waitingQueue only if the sequence was not created by the same Try Forcing a Use step The reason is that part of sequence sk has been already used by this step, and reapplying the step will likely duplicate tests However, there is an exception from this constraint: if sk has covered new intra-class def-use pairs, we allow reapplying the same step because some uses might become reachable (e g , an use in an object may alter the state of an object component, enabling the execution of another use in that object) b) Try Forcing a Def: While the purpose of the previous step is to try to exercise invocations of methods performing new uses of some fields of the target objects, the purpose of the second step is to call methods in the hope of executing new definitions of some fields As mentioned, we can statically compute all tuples (mdef ,muse ,F,C) representing the intra-class def-use pairs for every class C Moreover, during the execution of our modified random test generation, we know which tuples (mdef ,muse, F, C) have already been covered Consequently, for each reference vk;l from a sequence sk pointing to an object of type C, we can easily determine which tuples (mdef ,muse, F, C) have not been covered yet and thus, which methods mdef we should try to invoke on the object vk;l As a result of this step, several tuples of the form (sk,vk;l, (mdef,muse,F,C),forceDef) are added to the waitingQueue Recall that the intra-class def-use pairs for Listing 1 are (SomeClass,a,x, SomeClass), (SomeClass,b,x, SomeClass), (b,a,x, SomeClass), and (b,b,x, SomeClass) Moreover, as shown at the end of section iii-B2, during the execution of the single test available so far (see Listing 3), we have a single object of class SomeClass referred by var0 and no intra-class def-use pair has been covered yet To increase the likelihood of covering these pairs we should try to execute the definitions of these pairs Thus, we should generate tests which invoke method b on var02 As a result, the following tuples are added to the waitingQueue: (testl, var0, (b, a, x, SomeClass), forceDef) and (testl, var0, (b, b, x, SomeClass), forceDef) Like the previous step, forcing a def is also controlled by several constraints described in the following: 1) a tuple is added to the waitingQueue if and only if the corresponding (mdef ,muse, F, C) has not been already covered; the reason is that we should not force another execution of the same intra-class def-use pair since it has been already executed; 2) if the executed sequence sk has been obtained by our guiding procedure we do not permit the execution of the Try Forcing a Def step; the reason is that sk may already contain a sub-sequence that has been already processed by this step and thus, reapplying it would result in highly duplicated tests; 3) a tuple is added to the waitingQueue if and only if the corresponding (mdef, muse ,F, C) is not already present in another tuple from the waitingQueue; the primary rationale is that the test might have performed the def from mdef but not also the use from muse in this case, forcing an invocation to mdef is not needed since only a use should be forced trying to cover the pair However, forcing the muse invocation has already been done by the Try Forcing a Use step 4) Produce a Guided Test: As we have mentioned at the beginning of the guiding procedure description, when RAN-DOOP starts trying to generate a new test, we try to bypass the random selection of a method to be invoked For that purpose, the waitingQueue is fed with information as described in the 2Since a constructor can be executed only at the object creation time, the def methods from the first two tuples are ignored in this step previous paragraphs if the waitingQueue is empty, we usually let Randoop produce a test in its normal manner3 4 Otherwise, the first tuple from the queue is processed We remind that this tuple has the form (sj, , (mdef, muse, F, C), selector) The selector has the symbolic values forceDef or forceUse and is used to choose which method to invoke: mdef or muse The target reference of the invocation will be vi;j from the sequence sj For the remaining arguments of the invocation we perform exactly the same actions as Randoop, e g , for a reference parameter, a compatible value vx;y from the test sx is randomly chosen as the actual value of the parameter Finally, all selected tests sequences are concatenated (including sj) and the invocation to mdef  muse on vi;j is appended at the end As can be seen, a test is produced in a very similar way to Randoop but we explicitly mention the invoked method and the target reference to increase the likelihood of covering intra-class def-use pairs according to the all-uses criterion To exemplify the generation of a new test, we return to the code in Listing 1 While explaining the guiding procedure, we have seen that after the execution and analysis of the test in Listing 3, the waitingQueue contains the following tuples: (testl, var0, (SomeClass, a,, x, SomeClass), forceUse), (testl, var0, (SomeClass, b, x, SomeClass), forceUse), (testl, var0, (b, a, x, SomeClass), forceDef) and (testl, var0, (b, b, x, SomeClass), forceDef) The first tuple is extracted and the first test (i e , Test1) is duplicated to provide a reference (e g , var0) on which to invoke method a (i e , the method containing an use that should be executed) Since the method has no additional arguments, the new test (i e , Test2) can be produced and it is shown in Listing 44 Here, the execution of the second test immediately covers an intra-class def-use pair: the definition of the instance variable x from the constructor and its use inside method a Listing 4: producing a New Test   Testl SomeClass var0 = new SomeClass();   Test2 SomeClass var0 = new SomeClass(); int varl = var0 a(); To avoid generating unnecessary tests we add some con-straints that guard the test construction: 1) if all intra-class def-use pairs of class C corresponding to the current working tuple have been covered, the test construction is useless and no test is produced; 2) As mentioned in Section iii-B2, for each instances we record the list of operations performing a def or an use to some field of the object Consequently, when we force a call to a method mdef   muse on an object, we can check to see if the resulting sequence of invocations have not already been performed on another object of the same class if such an object exists, then we do not create a 3The single exception is described in Section iii-B1 4We mention that Randoop eliminates subsumed tests and thus the final jUnit test suite will not contain the first test new test The reason is that the two objects might have the same state and thus, applying our guiding procedure on them would likely have the same effect, resulting in duplicated tests Moreover, if our guiding procedure fails to cover some def-use pair using an object, reapplying the procedure on an object with a similar state would probably also fail to cover that def-use pair iV EVALUATiON To evaluate the proposed approach we have created a modi-fied version of Randoop that includes our guiding technique For this, we have used the ASM framework and the CodePro analysis tool in this section we describe the conducted case study and we discuss the results A Scenario For our evaluation we have selected the jung network graph framework5, implemented in Java From this system we have eliminated some irrelevant classes that are usually not targeted by testing activities (e g , the implementations of other tests, etc ) Moreover, since we aim to interact with concrete objects, we have prepared for testing with the original modified Randoop tool only the concrete non-nested non-inner accessible classes Consequently, 250 classes from the jung system remained to be used for our evaluation The main goal of our case study was to establish if the proposed guiding technique can improve the intra-class def-use pair coverage compared to the random test generation technique implemented in Randoop, and whether the ob-tained coverage increases faster Consequently, we have gen-erated tests using i) the original Randoop and ii) the modified Randoop including the proposed guiding approach For each test suite obtained, we have estimated the intra-class def-use pair coverage and we have compared the results This process has been repeated for different time execution limits in each execution we used the default Randoop options including the randomization seed (the single excep-tions being the time limit and the list of tested classes as previously described) Moreover, to limit the influence of non-determinism6, we have repeated each execution 3 times and we have reported and compared the averages To ensure a proper comparison, we have included in the maximum execution time the extra time required by the oper-ations from our guiding procedure (e g , code instrumentation time, etc ) All test generations have been performed on the same computer, with a 2 5 GHz intel i5 processor, 8 GB of RAM, 128 GB of SSD and running Mac OS X 10 8 5 For each run, the JVM has been configured with a heap of 4 GB B Results Table i gives the numerical results of our experiments, while Figure 2 shows a visual summary for easier comparison 5 http:  jung sourceforge net  6The original Randoop should be deterministic since it uses the same default randomization seed However, the tested code may introduce non-determinism, the garbage-collector might influence the real execution time, the modified Randoop contains some non-deterministic implementation particularities e g , the "order" of elements when iterating over a set, etc (a) intra-Class Def-Use Pair Coverage (b) Number of Generated Tests Fig 2: Experimental Results Time Limit (sec ) Non-Guided Coverage (% avg ) Guided Coverage (% avg ) Non-Guided Tests (avg ) Guided Tests (avg ) 400 25 3 20 1 2 091 3 2 777 0 800 30 8 32 6 3 254 7 4 647 0 1600 35 9 42 5 5 176 0 6 863 3 3200 37 5 45 7 6 905 3 10 136 7 4000 39 7 47 4 8 693 0 11 628 0 4400 40 8 48 5 9 238 0 13 780 3 TABLE i: Experimental Results As can be observed, for small execution time limits (e g , 400 seconds), our guiding procedure temporarily achieves lower intra-class def-use coverage than the original Randoop However, this is expected: the guiding procedure per-fonns many additional actions (e g , code instrumentation, de-tennining the def-use pairs in a concrete class, capturing runtime infonnation, etc ) especially at the beginning of the test generation process For a fair comparison, the time allocated to the modified Randoop includes these actions and thus its real time limit for test generation is actually smaller Nevertheless, by inspecting our code during evaluation, we have identified potential ways to speed up these additional tasks and lower their overhead on the test generation procedure At about 800 seconds, our guiding procedure starts showing its advantage, see Figure 2(a) For longer time limits, the test suites produced with our guided approach achieve better coverage, by 6 4% on average For the largest time limits employed, the improvement appears to stabilize at about 8% Another observation is that the guiding procedure tends to increase coverage faster that the pure random approach (see the improvement rate between 800 and 1600 seconds in Figure 2(a)) This is consistent with the expected behavior: each time an object of a class containing not yet covered by intra-class def-use pairs is detected in a test, the guiding procedure attempts to immediately extend that test by calling the required methods on that object to improve the coverage Thus, this suggests that the improvements are the results of the guiding procedure itself and that they are not circumstantial We should also emphasize that for a time limit greater than 1600 seconds, the coverage improvement rate appears to be similar for the guided and for the unguided Randoop This might be a sign that beyond this threshold the coverage improvement is actually done by the "unguided part" of the modified Randoop exposing a limitation of the proposed technique intuitively, such a situation may appear when cover-ing an intra-class def-use pair requires complex object protocol conditions for invoking the methods containing the targeted defs uses (e g , not simply invoking a method containing a use after a method containing a def to the same field) We also present in Table i and Figure 2(b) the number of tests produced by the modified and unmodified Randoop for each time limit, averaged over the three runs in essence, executing our guiding procedure generates a larger number of tests, by 40% on average Since more tests are produced in order to increase the coverage with an average of 6 4%, this might be perceived as a disadvantage of our approach However, this is not necessarily true, because it depends on the actual result revealed by a generated test For instance, if it exposes an error or an object protocol requirement (e g , a method must always be invoked before another one) that is not observed by the test, then the additional test is extremely valuable However, at this time, we have not performeri a manual analysis of these tests and thus, we cannot draw a conclusion with respect to this issue Evaluating the results for this experiment, we argue that our guiding technique can augment the feedback-directed random test generation in order to more quickly produce better tests with respect to the intra-class def-use testing quality criterion Although the gain might be considered relatively small at this time, we point out that it was obtained with an unoptimized adaptation as discussed at the beginning of this section C Threats to Validity in tenns of externai validity, we have used only one system and thus we cannot reliably generalize the conclusion However, the analyzed program is real (i e , not fabricated) which should increase the relevance of our results in tenns of construct validity, our guiding procedure aug-ments the Randoop source code To the best of our under-standing, the results should not depend on other interactions besides the intended guiding procedure, however, this cannot completely be excluderi Another issue regarding construct validity comes from the way in which we approximate def-use pairs by identifying them using the methods containing the corresponding statement However, we consider that this approximation is acceptable for the current state of our work V Related Work Our paper aims to automatically generate a test suite with good dataflow coverage Applying dataflow testing to object-oriented programs was first investigated by Harrold and Rothermel They distinguish between def-use pairs at three levels: intra-method, inter-method (when exercised within a call to a single public method), and intra-class (resulting from calls to an arbitrary pair of public methods) We focus on intra-class testing as being the most comprehensive (and needed especially to test libraries) Finding precise def-use relations can be problematic itself; the algorithm presented in takes into account the typical obstacles in object-oriented analysis (dynamic dispatch, imprecise concrete types, aliasing, exceptions) and can be used in test generation Our prototype uses an approximation by considering method pairs that define and use the same field A more precise def-use analysis would avoid generating tests that try to exercise infeasible def-use pairs; at the same time, the reported coverage would be higher, being measured relative to a smaller set of def-use pairs to be covered Tsai et al show that dataflow anomalies have to be considered in order to obtain a complete set of relevant test cases and do this in a preliminary stage before test case generation Several studies provide empirical evidence that dataflow coverage criteria are useful in testing object-oriented software A test strategy that combines the all-bindings and all du-pairs criteria is presented in it is shown that more than 80% of object-oriented faults can be detected, although def-use relations are tracked only per object rather than at the level of individual fields in , def-use coverage is employed also for inter-class (integration) testing, by using contextual information to test state-dependent behavior; this is shown to be effective for detecting seeded mutants An approach to detect state-dependent failures is con-structed in by augmenting dataflow analysis (which merely produces du-pairs) with symbolic execution to achieve the necessary conditions and automated deduction that generates call sequences conforming to the needed pre- and postcon-ditions While the combination is supposed to fall back on just dataflow analysis in case of complex programs, it is only illustrated on a simple case study in comparison, by adapting RANDOOP we have chosen a more light-weight approach, but which is shown to work on a real program of significant size To increase coverage, evolutionary algorithms have been employed Tonella uses genetic algorithms to generate test sequences; relevant features include the methods to invoke and the created objects to use for these invocations, which are also used in our approach to guiding test generation A genetic algorithm aimed specifically at dataflow testing is presented in ; it also achieves a higher coverage than random testing, using a smaller test suite A large-scale case study, implementing dataflow criteria for the EvoSuite tool, is reported in The coverage level achieved (54% of def-use pairs) is comparable to our results; despite the relatively low value, achieving a higher mutation score than for branch coverage confirms the benefits of dataflow testing Vi Conclusion We have presented in this paper an approach to guide a well-known random test generation technique in order to more quickly produce better test suites with respect to the intra-class def-use coverage testing criterion for object-oriented programs We have also presented our initial evaluation of the approach emphasizing its potential Although the identified gain might be considered relatively small, it proves that it deserves investing work in optimizing some implementation details that might improve the current results Consequently, this would be one of the main directions for future work At the same time, we would like to identify other test generation guiding approaches to address other object-oriented dataflow testing criteria Last but not least, we should also investigate the advantages of producing test suites according to the previous criteria in order to find defects within the tested applications References C Pacheco, S K Lahiri, M D Ernst, and T Ball, "Feedback-directed random test generation," in 29th international Conference on Software Engineering (iCSE) iEEE Computer Society, 2007, pp 75-84 M J Harrold and G Rothermel, "Performing data flow testing on classes," in Proceedings, 2n ACM SiGSOFT Symposium on Foundations of Software Engineering (FSE) ACM, 1994, pp 154-163 iNRiA and France Telecom, "introduction to the ASM 2 0 bytecode framework," http:  asm ow2 org doc tutorial-asm-2 0 html R Marinescu, G Ganea, and i Verebi, "inCode: Continuous quality assessment and improvement," in 14th European Conference on Software Maintenance and Reengineering (CSMR) iEEE Computer Society, 2010, pp 274-275 R Chatterjee and B G Ryder, "Data-flow-based testing of object-oriented libraries," Rutgers University, Tech Rep DCS-TR-433, 2001 B -Y Tsai, S Stobart, and N Parrington, "Employing data flow testing on object-oriented classes," iEE Proceedings - Software, vol 148, no 2, pp 56-64, 2001 M -H Chen and H Kao, "Testing object-oriented programs - an integrated approach," in international Symposium on Software Reliability Engineering iEEE Computer Society, 1999, pp 73-82 G Denaro, A Gorla, and M Pezze, "An empirical evaluation of data flow testing of Java classes," University of Lugano, Tech Rep 2007 03 U A Buy, A Orso, and M Pezze, "Automated testing of classes," in Proceedings of the ACM SiGSOFT international Symposium on Software Testing and Analysis (iSSTA) ACM, 2000, pp 39-48 P Tonella, "Evolutionary testing of classes," in Proceedings of the ACM SiGSOFT international Symposium on Software Testing and Anal-ysis (iSSTA) ACM, 2004, pp 119-128 A S Ghiduk, M J Harrold, and M R Girgis, "Using genetic algorithms to aid test-data generation for data-flow coverage," in 14th Asia-Pacific Software Engineering Conference (APSEC) iEEE Computer Society, 2007, pp 41-48 M Vivanti, A Mis, A Gorla, and G Fraser, "Search-based data-flow test generation," in 24th iEEE international Symposium on Software Reliability Engineering (iSSRE) iEEE, 2013, pp 370-379 Duplicate code detection using anti-unification Peter Bulychev Marius Minea Lomonosov Moscow State University, Russian Federation institute e-Austria Timisoara, Romania  : peter bulychev@gmail com  : marius@cs utt ro Abstract—This paper describes a new algorithm for finding software clones it is conceptually independent of the source language of the analyzed programs, working at the level of abstract syntax trees The algorithm considers that two sequences of statements form a clone if one of them can be obtained from the other by replacing some subtrees To our knowledge this notion was not previously employed in the literature it allows to take into account all information on the syntactic structure of a program We have implemented this algorithm in the tool Clone Digger it currently supports the Python and Java languages Clone Digger is free and provided under the GPL license i iNTRODUCTiON Different researchers report that the amount of duplicate code in software systems varies from 6 4% - 7 5% to 13% -20% Duplicate code can occur as a result of approaches to development and maintenance, due to language or programmer limitations, or simply by accident Code duplication can be a significant drawback, leading to bad design, and increased probability of bug occurrence and propagation As a result, it can significantly increase maintenance cost (for instance, any bug in the original has to be fixed in all duplicates), and form a barrier for software evolution Consequently, duplicate code detectors are a useful class of software analysis tools Such tools can aid in measuring the quality of software systems and in the process of refactoring Techniques for detecting duplicate code can be classified according to several criteria Code can be viewed as similar based on syntactic criteria or at a semantic level (from the point of view of execution effects) in this paper we consider only syntactic similarity Within this category, duplicate clone detection can be performed at different levels of granularity: strings, tokens, abstract syntax trees, feature vectors The first two are quite rigid and low-level, therefore we use an approach based on abstract syntax trees Two sequences of statements form duplicate code if they are similar enough according to a selected measure of simi-larity Such measures can be defined using a set of allowed editing operations and their cost According to there are three different types of syntactic changes: adding removing of whitespaces and comments, changing names of variables, and more complex modifications We aim to detect a wide range of clones, including the third type: e g , expressions with similar structure in essence, we wish to characterize the structural similarity of two code fragments in order to determine whether they should be classified as code duplicates We can formalize this by using the concept of anti-unifier, which denotes the most specific generalization of two terms Anti-unification was first described by Plotkin and Reynolds in the current paper we use anti-unification to calculate the distance between two abstract syntax trees and group similar trees into equivalence classes called clusters Anti-unification catches the structural differences between two trees: it allows for instance the replacement of a variable with a more complex expression, but distinguishes between functions of different arities Our algorithm of finding duplicates consists of several phases in the beginning we partition all statements into clusters using anti-unification distance; as a result, the code is abstractly viewed as sequence of cluster identifiers Next, we find all pairs of identical sequences of cluster iDs The matching pairs of sequences, which have similar statements in corresponding positions, are now globally checked for similarity This check is again performed using anti-unification distance, and duplicates are reported if the distance is below a certain threshold A fully syntactic abstraction in duplicate clone detection is first reported in Their algorithm detects a similarity between, e g , a and a[x+1] by reducing them to the pattern a[?] This pattern can be seen as anti-unifier of the two expressions Our work continues this approach based on patterns, extending it to cover more complex programming constructs such as sequences of statements We use a more natural and flexible way of building patterns; moreover we provide metrics to assess similarity, whose quality increases if the occurrences of the same variable (in the same scope) refers to the same leaf in the abstract syntax tree Our algorithm is also conceptually independent of the programming language, working at the level of abstract syntax trees ii PRELiMiNARiES A Anti-unification Anti-unification was first studied in , As the name suggests, given two terms, it produces a more general one that covers both rather than a more specific one as in unification Let E1 and E2 be two terms Term E is a generalization of E1 and E2 if there exist two substitutions "2AP a function that labels each state with some subset of a set AP of atomic propositions A transition a is enabled in state s if there is some state s' for which a(s, s') holds We denote the set of transitions enabled in s by enabled(s) if for any state s there is at most one state s' with a(s,s'), we say that a is deterministic and we will write s' = a(s) in the following, we will consider only deterministic transitions Note that although the transitions are deterministic (a usual practice in modeling concurrency), we can easily model non-deterministic choice (between different transitions that are enabled at the same time) We introduce the key concept of independent transitions These are transitions whose respective effects are the same, irrespective of their relative order Definition 1 Two transitions a and p are independent if for every state s the following two conditions hold: Enabledness: if a, 3 G enabled(s) then  3   enabled(a(s)) and a E enabled(Jd(s)) Commutativity: if сн, 3 E enabled(s) then a( 3(s)) =  3(a(s)) in other words, a pair of transitions is independent, if at any state executing either of them does not disable the other, and executing both in either order leads to the same state Two transitions are called dependent if they are not independent • To construct the reachable state space, model checking algorithms perform a traversai of the state-transition graph (typically depth-first or breadth-first search) The traversai starts from the set of initial states and successively con-structs new states by exploring the transitions that are enabled in the current state Partial order reduction differs from full state exploration in that at each step it considers only a subset of the transitions enabled at the current state s This set is denoted by ample(s) With a good choice of ample(s), only a small fraction of the reachable state space will be explored On the other hand, a number of conditions must be enforced on this set to ensure that the truth value of the checked property is preserved in the reduced model in the following, we give a set of such conditions together with an informai explanation of their role A complete treatment of these conditions together with a formal proof is given in Condition СО is the simplest and guarantees that if a state has a successor in the original model, it also has a successor in the reduced model СО [Non-emptiness condition] ample(s) = 0 if and only if enabledfs) = 0 Ci [Ample decomposition] On any path starting from state s, all the transitions appearing before a transition in ample(s) is executed, are independent of all the transitions in ample(s) 349 То explain Ci, note that not every transition sequence in the original model may appear in the reduced model, since the latter is restricted to transitions from ample(s) at each state s However, Ci ensures that some transition from ample(s) may be taken in the reduced model without disabling any of the transitions in the original sequence Consider any transition sequence L(s) = L(s') A state s is called fully expanded if ample(s) = enabled(s) in this case, all transitions are selected for exploration and no reduction is performed at this point C2 [Non-visibility condition] if there exists a visible transition in amplets') then s is fully expanded Revisiting the two cases discussed for condition Ci it can be seen that in each of these cases, a is an invisible transition (since s0 is not fully expanded), and therefore the two paths considered will be stuttering equivalent Finally, we have to ensure that an enabled transition which does not belong to an ample set will eventually be taken Otherwise, the constructions outlined in the discussion of Ci may close a cycle in the reduced state graph while never taking a non-ample transition which is enabled throughout the cycle Conse-quently some transitions can be ignored and the truth value of a specification in the two models may no longer be the same Condition C3 is introduced to eliminate this problem: C3 [Cycle closing condition] At least one state along each cycle of the reduced state graph is fully expanded 3 350 As stated in the introduction, our goal was to develop a reduction algorithm which is not restricted to depth-first explicit state search, like the typical one described for instance in The principles and implementation of this algorithm are described below 2 2 A Generic Partial Order Reduction The cycle closing condition C3 is very natural to check while performing a depth-first search However, it cannot be checked directly when performing a breadth-first search (which is intrinsic to the symbolic methods), and therefore it seems that significant modifications to the model checking algorithms are needed to accommodate it (c [l]) We show, however, that it is possible to ensure C3 by performing static checks on the local state-transition graphs of each process Conceptually, this method is able to perform a reduction (in terms of the number of reached states) at least as good as the traditional dynamic algorithms, although in practice there is a trade-off between the computational cost of the static reduction and computational savings afforded by the reduced model during the dynamic state space search in fact, the most efficient balance in our algorithm may be achieved with varying degrees of state space reduction To describe our algorithm, we first note that both C2 and C3 limit the extent to which reduction can be performed: they define cases where a state has to be fully expanded Moreover, if a cycle contains a visible transition, then C2 guarantees that the state at which that transition is taken is fully expanded, and therefore C3 holds for that cycle as well This suggests that C2 and C3 can be combined into a single condition C2’: C2’ There exists a set of transitions T which includes all visible transitions, such that any cycle in the reduced state space contains a transition from T When ample(s) includes a transition from T, s is fully expanded We caii the set of transitions T sticky transitions, since intuitively, they stick to all other enabled transitions To perform reduction during compilation of the modelled system, our goal is to determine a set T of sticky transitions that breaks all cycles of the reduced state graph, in order to guarantee C2’ We assume that the system to be verified is given as a set of component processes Then, an easy way to find such a set T to look at the static control flow graph of each process of the system Any cycle in the global state space projects to a cycle (or possibly a self-loop) in each component process By breaking each local cycle, we are guaranteed to break each global cycle This suggests strengthening C2’ to the following condition C2": 3 There are other stronger and weaker conditions that can be used instead of Condition C3 This particular version fits well with our framework 351 C2" There is a set of sticky transitions that include all visible transitions Each cycle in the static control flow of a process of the modelled system contains at least one sticky transition, and if amplets) includes a sticky transition, then s is fully expanded An ideal algorithm would find a minimal set of sticky transitions, in order to maximize the possible reduction However, this problem is at least as hard as reachability analysis On the other hand, efficient reduction can still be achieved even without a minimal set During the state search, priority is given to non-sticky transitions in this way, full expansion of a state is avoided as much as possible, although eventually no cycle can be closed without performing one full expansion it is possible therefore that several sticky transitions are delayed until all of them can be taken from the same state, which reduces the effect of selecting too many sticky transitions Even with delaying sticky transitions, it is still important that the static analysis generates a small number of sticky transitions, and yet is simple enough not to require excessive overhead The next section presents such an algorithm which is heuristically likely to generate a smaller number of sticky transitions than required by C2" The set of sticky transitions found by the algorithm guarantees C2’ and in the worst case it corresponds to C2" 2 3 Finding Sticky Transitions We assume that the system to be analyzed is given as a set of variables V and a set of processes {Pi, Ps, , Pn}- We also assume that, for each process Pi, there exists a variable cp, G V called the control point (or program counter) of process Pi, which always keeps the current local state of the process A transition of Pi updates cpi (not necessarily changes its value) and also updates some other variables from V The state space of the system is simply given by all possible valuations of the variables in V The state-transition graph of the system is derived from the local state-transition graphs of the processes by using interleaving semantics to model concurrency A local (resp global) cycle is a cycle in the state-transition graph of a process (resp the system) An execution of a cycle is the execution of all the transitions in the cycle starting from a state in the cycle An execution of a local cycle of a process P, restores the value of cpi But along the cycle, the values of variables other than cpi can be changed as well, without necessarily being restored by a complete execution We caii this the side effect of a local cycle on a variable and observe four different types of side effects: (1) decrementing effect, if the execution of the cycle always reduces the value of the variable, (2) incrementing effect if the execution of the cycle always increases the value of the variable, (3) complex effect if the effect of the execution of the cycle on the variable cannot be determined statically, and (4) no effect if the variable is not changed by any of the transitions in the cycle if the side effect of a local cycle c is incrementing or decrementing over the value of a variable v, it is impossible to have a global cycle in which only c is 352 executed There must be some other local cycle c' executed in the global cycle to compensate for the side effect of c on v For every global cycle in which c is executed, c' must be executed as well Therefore, there is no need to select a sticky transition from both c and c' since neither c nor c' can appear alone in a global cycle Let C denote the set of local cycles in the system We assume the existence of a function f : С x V 0} such that for c   C and v E V, f(c, v)   — (f(c, v) = +, f(c, v) = *,  (c, v) = 0, respectively) means a decrementing effect (incrementing, complex, no effect, respectively) on v by c One can always assume  (c, v) = * if v is updated within c but the side effect is difficult to analyze Definition 2 A set of local cycles H С C covers another set of local cycles G С C if any global cycle that contains (projects to) a local cycle с E G also has to contain some local cycle c'   H in the particular case where G is a singleton set {c}, we will simply say that H covers c We can effectively find a set of cycles that covers a local cycle c by considering the effect of c on some variable v For a given local cycle c and a variable v, let cv be the set of local cycles that can compensate the incrementing or decrementing effect of c on v which is formally defined as: cv = {с' E C  (f(c, v) = - and f(c' ,v) E {+,*}) or ( (c,v) = + and f(c',v) E {-,*})}• Since cv contains all cycles that can have the opposite effect on v compared to c, it follows that cv covers c This implies that if for some variable v, all cycles in cv have a sticky transition, there is no need for c to have a sticky transition Our goal is to find a subset T of sticky transitions that breaks (when removed from the local process graphs) some set H of local cycles such that H covers the entire set of local cycles C Then, since every global cycle contains some local cycle c   G, it also has to contain a cycle from H, and with it a sticky transition Consequently, condition C2’ holds To find such a set, note that trivially H covers H for any H С C We also have the following lemma: Lemma 3 Let H,G C G and с E C if H covers G and G covers c, then H covers G U {c} Proof: We need to show that for any global cycle Gi, if Gi contains a local cycle g E G U {c} then it has to contain a local cycle in H if g E G U {c} then we have two cases, either g E G or g — c Case (i): if g E G, then since H covers G, Gi must have a local cycle in H Case (ii)- if g = c, then Gi has to contain a local cycle д' E G since G covers c Furthermore, since Gi contains д' E G and H covers G, Gi contains some cycle in H Together, these two cases show that H covers G U {c} 353 The Algorithm 1 given in Fig 2 uses this lemma to compute a set H such that H covers C it alternates between analyzing the effect of local cycles on variables to increase the covered set G and adding cycles to H if there are still uncovered cycles in C Algorithm 1 0 choose H С C, let G := H 1 loop 2 do 3 let updated := false 4 Vc E C   G, Vv E V 5 if  (c, v) E {—, +} and cv C G then 6 letG:=GU{c} 7 let updated true 8 while (updated) 9 if (G = C) return H 10 let H := H U Cadd, G := GU Cadd for some Cadd QC G, Cadd   0 11 endloop Fig 2 An algorithm to find HCC such that H covers C it is possible not to take a variable v into account during the local cycle analysis by simply assuming that f(c, v) — * for all local cycles One can also assume the existence of auxiliary variables to produce the dependency relation between cycles For example, if there is a variable q of type queue in the system, we can assume that there is also an integer variable qi, which always keeps the number of elemente in this queue variable it is hard to define the side effect of a push or pop operation on q but they are incrementing and decrementing on qi respectively in the extreme case where Vc E G, Vv E V, f(c,v) E {*,0}, Algorithm 1 terminates with H = C as the worst case which corresponds to satisfying Condition C2" since we have to chose a sticky transition from each local cycle The selection of initial set of marked cycles, let’s call it Cm, can be arbitrary A good starting value is given by the sticky transitions which are already required by C2’ in particular, Cm can be chosen to be the set of all cycles that include a visible transition 2 4 The COSPAN implementation The static partial order reduction technique explained in this paper has been implemented for SDL and S R source-target pair of languages Nevertheless, the method is not specific to this pair of languages We give the details of this particular implementation in this section 354 Our method of applying the reduction entails the modification of the analyzed system such that a transition which is not in the ample set for a given state, is simply not enabled in other words, the set of enabled transitions at some state in the modified system is exactly an ample set at that state if the original system were analyzed with a modified search algorithm This property enables us to use any search technique to analyze the modified system in our case, we are able to use either explicit and symbolic search techniques and also apply localization reduction together with the partial order reduction in order to achieve in COSPAN a partial order reduction that is independent of the search control, we exploit the selection mechanism of S R The language provides selection variables, which are not part of the state, and thus do not incur any memory overhead When deciding on the successor state, each process chooses non-deterministically among some possible values of its selection variables The choice of any process can be dependent on the choice of the selections of the other processes (as long as this relationship is асу clic) in the compilation phase from SDL to S R, first the visible transitions are tagged as sticky Algorithm 1 is then executed to find a sufficient set of sticky transitions with the initial selection Cm being the set of local cycles that include a visible transition Also for each local state of a process, we calculate whether the transitions departing from that local state satisfy Condition Ci if the process has only internai transitions (the transitions in which only the local variables are referred), then it is clear that the transitions originating from that local state of the process satisfy Ci since no other process can refer to those variables Similarly, when the process has only enabled receiving transitions, the transitions of the process again satisfy Ci Although the send transition of another process can change the same message queue from which the receiving transition reads, their execution order does not matter Depending on the topology of the system, even a send transition of a process can also satisfy Ci, for example if there is no other process that can send a signal to the same message queue Note that, the compilation is dependent on the property to be checked (or more precisely, on the set of visible transitions) Therefore, a new compilation is required for each property that impose different visible transitions in the system in the current version of our compiler, a process is considered to be ample at a state if it does not have a sticky transition and all of its transitions satisfy Ci at its current local state Each process sets a global combinational flag to true or false depending on its ampleness at a global state From all the ample processes at a state, a process with the least number of outgoing transitions is chosen as the candidate for execution if more than one process has the least number of transitions, a static priority (index number) is used to chose only one process if there is no ample process at a state then all the processes are chosen as candidates for execution A process does not have any enabled transition unless it is selected as one of the candidates for execution This candidate election mechanism is implemented using the primitives of S R and is embedded in the source code of the analyzed system without causing any state space overhead We added approximately 1000 lines of code to the original compiler (which was around 9000 lines before the addition) to implement the reduction 355 3 Experimental Results This section gives experimental results for our method The examples specified in SDL are translated into S R using the compiler incorporating the static partial order reduction approach explained in this paper The first example is a concurrent sort algorithm There are N + 1 processes which sort N randomly generated numbers One of the processes simply gener-ates N random numbers and sends them to the next process on the right Each process that receives a new number compares it with the current number it has and sends the greater one to the process on the right The rightmost process receives only one number which is the largest one generated by the leftmost process The second example is a leader election protocol given in it contains N processes, each with an index number, that form a ring structure Each process can only send a signal to the process on its right and can receive a signal from the process on its left The aim of the protocol is to find the largest index number in the ring The protocol is verified with respect to all possible initial states The final example is an asynchronous tree arbiter as taken from whose purpose is to solve the mutual exclusion problem A resource is arbitrated between N users by a tree of arbiter cells Each arbiter cell can have at most two children and forwards a request coming from its children to the upper level of the tree When an arbiter cell receives the grant, it passes the grant to the child that requested the resource if both of the children are requesting, the grant signal is sent nondeterministically to one of them When the resource is released, the release information is sent to the root of the tree along the branch connecting the root and the user that released the resource An acknowledgement is also sent back by the root to the user, using the same branch in the tree Table 1 gives the measurements we have obtained so far on these examples The examples above showed that in case of small state spaces, the symbolic search with partial order reduction is more expensive than an explicit search with partial order reduction it is even more expensive than a symbolic search on original system without any partial order reduction As the state space gets bigger, the symbolic search with partial order reduction, starts doing better than the symbolic search without reduction For large systems, the symbolic search with partial order reduction becomes the fastest of all the alternatives The concurrent sort example has an interesting property for the application of Algorithm 1 We have introduced an artificial integer variable for each message queue in the system that is assumed to keep the number of messages in the queue When Algorithm 1 is executed by taking into account only these artificial variables with Cm = 0 initially, it returns H = 0 The reason of this is that, even though there are cycles in the local graphs of the processes, the global state space has no cycles and this can be determined by a syntactic analysis Since the ample set reduction is applied completely statically, it cannot ben-efit from all the information available to a dynamic algorithm For example, 356 Experiments No Reduction (no of states) Ample Reduction (no of states) Sort with 7V = 2 191 66 Sort with N = 3 4903 553 Sort with N = 4 135329 4163 Sort with N = 5 3940720 29541 Leader with N — 2 383 107 Leader with N = 3 11068 490 Leader with N = 4 537897 3021 Leader with N = 5 26523000 21856 Arbiter with N = 2 73 48 Arbiter with N = 4 18247 4916 Arbiter with N = 6 3272700 358352 Table 1 Experimental Results Condition C3 is satisfied by predicting the cycles in the global state space at syntactic level it is possible that Algorithm 1 will try to break global cycles that can actually never occur A reduction algorithm that breaks global cycles as they appear during the analysis seems to be more fine tuned for the reduction However, the produced experimental results are as good as those obtained by dynamic algorithms 4 Summary Model checking tools are highly complex and required to have a a good performance On the other hand, the state space explosion problem forces the tool implementors to incorporate the possible reduction techniques into the tools, making the implementation more complex Frequently, it is not straightforward to implement a reduction technique on top of the search technique used by a model checker Until recently , there were no implementations that combine partial order reduction and symbolic search techniques although both methods were known for a long time and had good implementations separately We have demonstrated a way to compute a partial order reduction of an asyn-chronous system statically This facilitates implementation of the reduction into model-checking tools without the need to alter the search algorithms in particular, our method allows combining partial order reduction with symbolic search Although our implementation of the method uses SDL and S R as the source and the target languages, the method itself is not specific to these languages Experimental results indicate that for small models, static partial order reduction is faster with an explicit state representation However, for large models, 357 the symbolic search is not only faster, but completes on models which are com-putationally infeasible with reduction based on an explicit state search References 1 R Alur, R K Brayton, T A Henzinger, S Qadeer, and S K Rajamani Partial order reduction in symbolic state space exploration in Proceedings of the Conference on Computer Aided Verification (CAV’97), Haifa, israel, June 1997 2 E Bounimova, V Levin, O Basbugoglu, and K inan A verification engine for SDL specification of communication protocols in S Bilgen, U Qaglayan, and C Ersoy, editors, Proceedings of the First Symposium on Computer Networks, pages 16-25, istanbul, Turkey, May 1996 3 C T Chou and D Peled Formal verification of a partial-order reduction technique for model checking in Proceedings of the Second international Workshop on Tools and Algorithms for the Construction and Analysis of Systems, pages 241257, Passau, Germany, 1996 Springer-Verlag Volume 1055 of Lecture Notes in Computer Science 4 D L Dill Trace Theory for Automatic Hierarchical Verification of Speed-independent Circuits MiT Press, 1989 5 D Dolev, M Klave, and M Rodeh An O(nlogn) unidirectional distributed algorithm for extrema finding in a circle Journal of Algorithms, 3:245-260, 1982 6 P Godefroid and D Pirottin Refining dependencies improves partial-order verification methods in Proc 5th Conference on Computer Aided Verification, volume 697 of Lecture Notes in Computer Science, pages 438-449, Elounda, June 1993 Springer-Verlag 7 R H Hardin, Z Har’El, and R P Kurshan COSPAN in Proc CAV’96, volume 1102, pages 423-427 LNCS, 1996 8 G J Holzmann Design and Validation of Computer Protocols Prentice-Hall, 1992 9 G J Holzmann and D Peled An improvement in formal verification in Formal Description Techniques 1994, pages 197-211, Bern, Switzerland, 1994 Chap-man&HaJl 10 R Kurshan Computer-Aided Verification of Coordinating Processes Princeton University Press, 1994 11 L Lamport What good is temporal logic in iFiP Congress, pages 657-668 North Holland, 1983 in Computer Science 115 12 D Peled Combining partial order reductions with on-the-fly model checking Formal Methods in System Design, 8:39-64, 1996 13 Functional Specification and Description Language (SDL), CCiTT Blue Book, Rec-ommendation Z 100 Geneva, 1992 14 A Valmari A stubborn attack on state explosion in Proc 2nd Workshop on Computer Aided Verification, volume 531 of Lecture Notes in Computer Science, pages 156-165, Rutgers, June 1990 Springer-Verlag ii Task Description Thomas Lindner Forschungszentrum informatik, Karlsruhe Abstract This chapter presents a case study in the field of control systems The task consists of developing verified control software for a model representing a production cell installed in a metal-process-ing plant in Karlsruhe The paper describes the functionality of the model, explains how the control program relies on the system’s sensors, discusses the possibilities for driving the model with the help of various actuators, and finally defines the requirements that are to be fulfilled by the control software 2 1 Description of the Production Cell The Forschungszentrum informatik has created a model of a production cell for mounting frames which was built as part of a study in microcomputer technology in 1989 This is not a model only in theory: it represents an actual industrial instal-lation in a metal-processing plant in Karlsruhe The case study presents a realistic industry-oriented problem, where safety re-quirements play a significant role and can be met by the application of formal methods The manageable size of the task allows for experimenting with several approaches The production cell processes metal blanks which are conveyed to a press by a feed belt A robot takes each blank from the feed belt and places it into the press The robot arm withdraws from the press, the press processes the metal blank and 10 Thomas Lindner opens again Finally, the robot takes the forged metal plate out of the press and puts it on a deposit belt (see figure 1) elevating rotary table Figure 1 Top view of the model This basic sequence is complicated by further details: • To enhance the utilization of the press, the robot is fitted with two arms — thus making it possible for the first arm to pick up a blank while the press is forging another plate • The robot arms are placed on different horizontal planes, and they are not vertically mobile This explains why an elevating rotary table has to be intercalated between the feed belt and the robot • Another consequence of the fact that the two robot arms are at different levels, is that the press has not only two, but three states: open for unload-ing by the lower arm, open for loading by the upper arm, and closed (pressing) • in order to perform demonstrations with the model, the production sequence should be able to run without an operator The "forged" metal plates — which the press in the model does not actually modify — are therefore taken from the deposit belt back to the feed belt by a travelling crane, thus making the entire sequence cyclical Task Description 11 • A photoelectric cell at the end of the deposit belt informs the control program about the arrival of metal plates to be picked up by the travelling crane The general sequence (from the perspective of a metal plate) is the following: 1 The feed belt conveys the metal plate to the elevating rotary table 2 The elevating rotary table is moved to a position adequate for unloading by the first robot arm 3 The first robot arm picks up the metal plate 4 The robot rotates counterclockwise so that arm 1 points to the open press, places the metal plate into it and then withdraws from the press 5 The press forges the metal blank and opens again 6 The robot retrieves the metal plate with its second arm, rotates further and unloads the plate on the deposit belt 7 The deposit belt transports the plate to the travelling crane 8 The travelling crane picks up the metal plate, moves to the feed belt, and unloads the metal plate on it This description of the system is of course rather simplified First, the individual system components are not specified in detail Secondly, the cell is configured so that several metal plates can be processed and transported simultaneously; this should allow an optimal utilization of the cell capacity 2 1 1 Feed Belt The task of the feed belt consists in transporting metal blanks to the elevating ro-tary table The belt is powered by an electric motor, which can be started up or stopped by the control program A photoelectric cell is installed at the end of the belt; it indicates whether a blank has entered or left the final part of the belt 2 1 2 Elevating Rotary Table The task of the elevating rotary table is to rotate the blanks by about 45 degrees and to lift them to a level where they can be picked up by the first robot arm The vertical movement is necessary because the robot arm is located at a different level than the feed belt and because it cannot perform vertical translations The rotation of the table is also required, because the arm's gripper is not rotary and is therefore unable to place the metal plates into the press in a straight position by itself 12 Thomas Lindner electric motor electric motor top view front view Figure 2 Elevating rotary table 2 1 3 Robot The robot comprises two orthogonal arms For technical reasons, the arms are set at two different levels Each arm can retract or extend horizontally Both arms rotate jointly Mobility on the horizontal plane is necessary, since elevating rotary table, press, and deposit belt are all placed at different distances from the robot's turning center Figure 3 Robot andpress (top view) The end of each robot arm is fitted with an electromagnet that allows the arm to pick up metal plates The robot's task consists in: taking metal blanks from the elevating rotary table to the press; transporting forged plates from the press to the deposit belt Task Description 13 The robot is fitted with two arms so that the press can be used to maximum ca-pacity Below, we describe the order of the rotation operations the robot arm has to perform, supposed the feed belt to delivers blanks frequently enough We presup-pose that initially the robot is rotated such that arm 1 points towards the elevating rotary table, and assume that all arms are retracted to allow safe rotation 1 Arm 1 extends and picks up a metal blank from the elevating rotary table 2 The robot rotates counterclockwise until arm 2 points towards the press Arm 2 is extended until it reaches the press Arm 2 picks up a forged work piece and retracts 3 The robot rotates counterclockwise until arm 2 points towards the deposit belt Arm 2 extends and places the forged metal plate on the deposit belt 4 The robot rotates counterclockwise until arm 1 can reach the press Arm 1 extends, deposits the blank in the press, and retracts again Finally, the robot rotates clockwise towards its original position, and the cycle starts again with 1 Figure 4 Order of the robot S actions in order to meet the various safety requirements described in section 2 3, a robot arm must retract whenever a processing step where it is involved is completed 2 1 4 Press The task of the press is to forge metal blanks The press consists of two horizontal plates, with the lower plate being movable along a vertical axis The press operates by pressing the lower plate against the upper plate Because the robot arms are placed on different horizontal planes, the press has three positions in the lower po-sition, the press is unloaded by arm 2, while in the middle position it is loaded by arm 1 The operation of the press is coordinated with the robot arms as follows: 14 Thomas Lindner 1 Open the press in its lower position and wait until arm 2 has retrieved the metal plate and left the press 2 Move the lower plate to the middle position and wait until arm 1 has load-ed and left the press 3 Close the press, i e forge the metal plate This processing sequence is carried out cyclically   Arm 1 loads the press   The press forges the plate   Arm 2 unloads the press Figure 5 Robot andpress (side view) 2 1 5 Deposit Belt The task of the deposit belt is to transport the work pieces unloaded by the second robot arm to the travelling crane A photoelectric cell is installed at the end of the belt; it reports when a work piece reaches the end section of the belt The control program then has to stop the belt The belt can restart as soon as the travelling crane has picked up the work piece The system designer is free to decide if the belts are to run continuously and should be stopped only when necessary, or if they should stand still and move only when necessary 2 1 6 Travelling Crane The task of the travelling crane consists in picking up metal plates from the deposit belt, moving them to the feed belt and unloading them there it acts as a link be- Task Description 15 electric motor Figure 6 Travelling crane tween the two belts that makes it possible to let the model function continuously, without the need for an external operator in a more realistic setting, the travelling crane could unload the metal plates into a container, or link the production cell to a further manufacturing unit The crane has an electromagnet as gripper which can perform horizontal and vertical translations Horizontal mobility serves to cover the horizontal distance between the belts, while vertical mobility is necessary because the belts are placed at different levels The typical operation of the crane is as follows: 1 After the signal from the photoelectric cell indicates that a work-piece has moved into the unloading area on the deposit belt, the gripper posi-tions itself through horizontal and vertical translations over the deposit belt and picks up the metal plate 2 The gripper transports the metal plate to the feed belt and unloads it there Efficiency considerations may lead a system designer to move the travelling crane back to the deposit belt at the end of this sequence so that incoming plates can be transported immediately 2 2 Actuators and Sensors in the previous section, the system and its operation have been described from an "object-oriented" perspective — in the broadest possible sense of the term in this 16 Thomas Lindner section, additional information is given from a different perspective, that of the control program responsible for driving the model 2 2 1 Actuators The system can be controlled using the following actions: 1 move the lower part of the press (electric motor); 2 extend and retract 1st robot arm (electric motor); 3 extend and retract 2nd robot arm (electric motor); 4 pick up and drop a metal plate with 1st arm (electromagnet); 5 pick up and drop a metal plate with 2nd arm (electromagnet); 6 rotate robot (electric motor); 7 rotate elevating rotary table (electric motor); 8 move elevating rotary table vertically (electric motor); 9 move gripper of travelling crane horizontally (electric motor); 10 move gripper of travelling crane vertically (electric motor); 11 pick up and drop a metal plate with gripper of travelling crane (electro-magnet); 12 activate and deactivate feed belt (electric motor); 13 activate and deactivate deposit belt (electric motor) 2 2 2 Sensors The control program receives information from the sensors as follows: 1 is the press in its lower position? (switch) 2 is the press in its middle position? (switch) 3 is the press in its upper position? (switch) 4 How far has 1st arm been extended? (potentiometer) 5 How far has 2nd arm been extended? (potentiometer) 6 How far has the robot rotated? (potentiometer) 7 is the elevating rotary table in its lower position? (switch) 8 is the elevating rotary table in its upper position? (switch) 9 How far has the table rotated? (potentiometer) 10 is the travelling crane positioned over the deposit belt? (switch) Task Description 17 11 is the travelling crane positioned over the feed belt? (switch) 12 What is the current vertical position of the gripper? (potentiometer) 13 is there a metal plate at the extreme end of the deposit belt? (photoelectric cell) 14 is there a metal plate at the extreme end of the feed belt? (photoelectric cell) Both photoelectric cells switch on when a plate intercepts the light ray Just af-ter the plate has completely passed through it, the light barrier switches off At this precise moment, the plate is in the correct position to be picked up by the travelling crane (sensor 13 of the deposit belt), respectively it has just left the belt to land on the elevating rotary table — provided of course that the latter machine is correctly positioned — (sensor 14 of the feed belt) While light barriers and switches provide a go no-go kind of information, the potentiometer returns a value — which, in the case of rotation for instance, is pro-portional to the angle 2 3 Requirements in a reactive system, one typically distinguishes between safety and liveness re-quirements Obviously, the safety requirements are most important in this setting: if a safety requirement is violated, this might result in damage of machines, or, even worse, injury of people The safety requirements are described in section 2 3 1, liveness properties are discussed in section 2 3 2, and the last section dis-cusses other properties interesting in this context The requirements listed below should be viewed as a pool of ideas This case study allows for evaluating methods and approaches according to a wide spectrum of requirements, but not all properties can be formally proved to hold We encour-age contributors to prove representants of the single classes of properties, or dis-cuss whether or how certain kinds of properties can be expressed or verified using the method under consideration 2 3 1 Safety requirements The control program must make sure that various safety requirements are met Each safety requirement is a consequence of one of the following principles: 18 Thomas Lindner • the limitations of machine mobility: the robot, for instance, would de-stroy itself if rotated too far; the press would damage itself if opened too far; • the avoidance of machine collisions: the robot, for instance, would col-lide with the press arm 1 would extend too far while pointing towards the press; • the demand to keep metal blanks from being dropped outside safe re-gions: the robot, for instance, may deposit blanks only at some, few plac-es, the feed belt has to make sure that the table is in the right position before transporting the blank too far; • the necessity to keep the metal blanks sufficiently seperate: light barriers, for instance, can distinguish two consecutive blanks only, if they have a sufficient distance Restrict machine mobility! The electric motors associated with the actuators 1-3 and 6-10 (cf section 2 2 1) may not be used to move the corresponding devices further than necessary in de-tail: • the robot must not be rotated clockwise, if arm 1 points towards the elevating rotary table, and it must not be rotated counterclockwise, if arm 1 points towards the press, • both arms of the robot must not be retracted less than necessary for pass-ing the press, and they must not be extended more than necessary for picking up blanks from the press, • the press must not be moved downward, if sensor 1 is true, and it must not be moved upward, if sensor 3 is true, • the elevating rotary table must not be moved downward, if sensor 7 is true, and it must not be moved upward, if sensor 8 is true, • the elevating rotary table must not be rotated clockwise, if it is in the po-sition required for transfering blanks to the robot, and it must not be ro-tated counterclockwise, if it is in the position to receive blanks from the feed belt, • if the crane is positioned above the feed belt, it may only move towards the deposit belt, and if it is positioned above the deposit belt, it may only move towards the feed belt, Task Description 19 • the gripper of the crane must not be moved downward, if it is in the posi-tion required for picking up a work piece from the deposit belt, and it must not be moved upward beyond a certain limit To fulfil these restrictions, the certain constants must be known We refer to Appendix A Avoid machine collisions! A couple of possible collisions are already avoided by simply obeying the above-mentioned restrictions on machine mobility We do not mention these collisions in the following Additionally, collision is possible and has to be avoided between the press and the robot, and between the crane and the feed belt: • the press may only close when no robot arm is positioned inside it, • a robot arm may only rotate in the proximity of the press if the arm is re-tracted or if the press is in its upper or lower position, • the travelling crane is not allowed to knock against a belt laterally (this would happen if the travelling crane moved from the deposit belt to the feed belt without a simultaneous vertical translation), • the travelling crane must not knock against a belt from above Again, we refer to Appendix A for the corresponding constants Do not drop metal blanks outside safe areas! Metal blanks can be dropped for two reasons: • the electromagnets of the robot arms or the crane are deactivated, • a belt transports work pieces too far To avoid this, it suffices to obey the following rules: • the magnet of arm 1 may only be deactivated, if the arm points towards the press and the arm is extended such that it reaches the press, • the magnet of arm 2 may only be deactivated, if its magnet is above the deposit belt, • the magnet of the crane may only be dactivated, if its magnet is above the feed belt and sufficiently close to it, • the feed belt may only convey a blank through its light barrier, if the table is in loading position, • the deposit belt must be stopped after a blank has passed the light barrier at its end and may only be started after the crane has picked up the blank 20 Thomas Lindner Keep blanks sufficiently distant! Errors occur if blanks are piled on each other, overlap, or even if they are too close for being distinguished by the light barriers To avoid these errors, it suffices to obey the following rules: • a new blank may only be put on the feed belt, if sensor 14 confirms that the last one has arrived at the end of the feed belt, • a new blank may only be put on the deposit belt, if sensor 13 confirms that the last one has arrived at the end of the deposit belt, • do not put blanks on the table, if it is already loaded, • do not put blanks into the press, if it is already loaded • do not move the loaded robot arm 1 above the loaded table, if the latter is in unloading position (otherwise the two blanks collide) 2 3 2 Liveness properties A very strong liveness property for this system is satisfied, if the following require-ment is fulfilled: Every blank introduced into the system via the feed belt will eventually be dropped by the crane on the feed belt again and will have been forged There are many weaker forms of this liveness requirement 2 3 3 Other requirements Efficiency it might be required that no blank is longer than a certain amount of time in the production cell The best result would be to prove that the implemented controller achieves minimum possible time To prove these properties it is neces-sary to remove the crane from the part of the system where time is measured Additionally, it can be required that the controller takes care that there are nev-er less then a certain number of work pieces in the system, provided that there are enough blanks available Flexibility The control software has to be as flexible as possible The effort for changing the control software and proving its correciness must be as small as pos-sible, when the requirements or the configuration of the cell change Task Description 21 Questions Several contributors found it helpful to consider the following question during their work or while writing the documentation: • Which properties have been proved? • Have assumptions about the architecture or the behavior of the production cell been made explicit and are they documented? • How long, how complicated is the description? is it understandable with-out deep knowledge of the method? Can it be discussed with a potential customer? • How much effort was spent? is the cost-benefit ratio balanced? • is it easy to change the controller? Can proofs be reused, or does a change in one part of the cell invalidate all proofs? • How eficient is the controller? Does it achieve maximum possible throughput? • is it possible to draw conclusions on how the hardware design of the production cell could be improved? Would it be easier to prove certain prop-erties, if additional sensors would be added? Would it be easier to control the cell, if any other additional hardware would be provided? Acknowledgements The author thanks Eduardo Casais for various suggestions and for providing the production cell clip-art from which all figures are adapted Several contributors de-tected various errors and inconsistencies in earlier versions of this case study Fi-nally, thanks are due to Jochen Burghardt, who found the classification of safety requirements presented in section 2 3 1 23 noiembrie 2010 Probabil cunoscut de toata lumea Vom incerca o implementare in C Pentru partea de grafica vom folosi biblioteca SDL Simple Directmedia Layer http:  www libsdl org  Vom aplica operatori pe biti lucrul cu matrici Biblioteca multimedia Ofera acces la resursele video si audio ale sistemului Contine functii pentru lucrul cu tastura si mouse-ul Pe scurt: utila la constructia de jocuri Tutorial pentru lucrul cu SDL din Code::Blocks http:  wiki codeblocks org index php?t itle=Using SDL with Code::Blocks Afiseaza un ecran negru timp de 5 secunde Evidentiaza operatiile de baza necesare pentru a utiliza SDL Operatii necesare initializare SDL Functia SDLJnit Configurare mod video Rezolutie si adancime de culoare Functia SDLJSetVideoMode Bucla principala de program in cazul nostru pauza de 5 secunde Functia SDLJDelay Eliberare resurse folosite Functia SDL Quit #include #include #include int main( int argc, char **argv) {  * initializare SDL *  if (SDLJnit(SDLJNiT ViDEO) = -1) { p ri n tf(" Failed"to"initialize"SDL: "° 'os   n" , SD L G etE rro r ( ) ) ; exit (EXiT-FAiLURE); } atexit(SDL Quit');  * configurare mod video *  SDL Surface *screen = SDL SetVideoMode(640 , 480, 8, SDL SWSURFACE); if (screen = NULL) { p ri n tf (" Fa iled "to"set "video "mode : "%s   n" , SDL GetError ( ) ) ; exit (EXiT FAiLURE); }  * bucla principala de program (pauza de 5 secunde) *  SDL Delay(5 000);  * eliberare resurse folosite *  SDL Quit(); exit(EXiT SUCCESS);